Re: pgbench - rework variable management

2019-09-02 Thread Thomas Munro
On Tue, Sep 3, 2019 at 4:57 PM Fabien COELHO  wrote:
> I noticed, but I do not have any windows host so I cannot test locally.
>
> The issue is how to do a mutex on Windows, which does not have pthread so
> it has to be emulated. I'll try again by sending a blind update to the
> patch and see how it goes.

If you have the patience and a github account, you can push code onto
a public github branch having also applied the patch mentioned at
https://wiki.postgresql.org/wiki/Continuous_Integration, go to
appveyor.com and tell it to watch your git hub account, and then it'll
build and test every time you push a new tweak.  Takes a few minutes
to get the answer each time you try something, but I have managed to
get things working on Windows that way.

-- 
Thomas Munro
https://enterprisedb.com




Re: block-level incremental backup

2019-09-02 Thread Dilip Kumar
On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke
 wrote:
>
0003:
+/*
+ * When to send the whole file, % blocks modified (90%)
+ */
+#define WHOLE_FILE_THRESHOLD 0.9

How this threshold is selected.  Is it by some test?


- magic number, currently 0 (4 bytes)
I think in the patch we are using  (#define INCREMENTAL_BACKUP_MAGIC
0x494E4352) as a magic number, not 0


+ Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ));
+
+ buf = (char *) malloc(statbuf->st_size);
+ if (buf == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+
+ if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
+ {
+ Bitmapset  *mod_blocks = NULL;
+ int nmodblocks = 0;
+
+ if (cnt % BLCKSZ != 0)
+ {

It will be good to add some comments for the if block and also for the
assert. Actully, it's not very clear from the code.

0004:
+#include 
+#include 
+#include 
Header file include order (sys/state.h should be before time.h)



+ printf(_("%s combines full backup with incremental backup.\n\n"), progname);
/backup/backups


+ * scan_file
+ *
+ * Checks whether given file is partial file or not.  If partial, then combines
+ * it into a full backup file, else copies as is to the output directory.
+ */

/If partial, then combines/ If partial, then combine



+static void
+combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir,
+   const char *subdirpath, const char *outfn)
+ /*
+ * Open all files from all incremental backup directories and create a file
+ * map.
+ */
+ basefilefound = false;
+ for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++)
+ {
+ fm = &filemaps[fmindex];
+
.
+ }
+
+
+ /* Process all opened files. */
+ lastblkno = 0;
+ modifiedblockfound = false;
+ for (i = 0; i < fmindex; i++)
+ {
+ char*buf;
+ int hsize;
+ int k;
+ int blkstartoffset;
..
+ }
+
+ for (i = 0; i <= lastblkno; i++)
+ {
+ char blkdata[BLCKSZ];
+ FILE*infp;
+ int offset;
...
+ }
}

Can we breakdown this function in 2-3 functions.  At least creating a
file map can directly go to a separate function.

I have read 0003 and 0004 patch and there are few cosmetic comments.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Patch to add hook to copydir()

2019-09-02 Thread Peter Eisentraut
On 2019-09-02 22:16, Swen Kooij wrote:
> Is there anything that I am missing? My early experiments have been
> very promising but my experience with Postgres internals is limited. Any
> help or feedback would be much appreciated.

You might want to review several previous threads that were
contemplating doing some reflink stuff with btrfs during database
creation.  Not exactly what you are doing, but related.

Also, wouldn't it work if your extension just defined its own copydir()
in the .so?  That should then be used over the built-in one -- maybe.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: fix "Success" error messages

2019-09-02 Thread Peter Eisentraut
On 2019-08-27 08:27, Michael Paquier wrote:
> Thanks for the new patch, and you are right that pg_checksums has been
> slacking here.  There is the same issue with pg_verify_checksums in
> 11.  Not sure that's worth a back-patch though.  Those parts could
> find their way to v12 easily.

Committed to master and PG12 with your suggested changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: roll pg_stat_statements into core

2019-09-02 Thread Michael Paquier
grOn Mon, Sep 02, 2019 at 12:07:17PM -0400, Tom Lane wrote:
> Euler Taveira  writes:
>> At least if pg_stat_statements
>> was in core you could load it by default and have a GUC to turn it
>> on/off without restarting the server (that was Magnus proposal and
>> Andres agreed).
> 
> That assertion is 100% bogus.  To turn it on or off on-the-fly,
> you'd need some way to acquire or release its shared memory
> on-the-fly.
> 
> It's probably now possible to do something like that, using the
> DSM mechanisms, but the code for it hasn't been written (AFAIK).
> And it wouldn't have much to do with whether the module was
> in core or stayed where it is.

If we were to actually merge the module into core and switch to DSM
instead of the current fixed amout of shared memory defined at start
time, then that would be a two-step process: first push the functions
into code with a GUC_POSTMASTER as currently done, and secondly
attempt to switch the GUC to be reloadable.

FWIW, I am not sure that we should have the module into core.
--
Michael


signature.asc
Description: PGP signature


Re: Contribution to Perldoc for TestLib module in Postgres

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 01:48:14PM -0400, Alvaro Herrera wrote:
> Agreed ... that's pretty much the same thing I tried to say upthread.  I
> wrote my own synopsis, partly using your suggestions.  I reworded the
> description for the routines (I don't think there's any I didn't touch),
> added a mention of $windows_os, added a =head1 to split out the ad-hoc
> routines from the Test::More wrappers.
> 
> And pushed.
> 
> Please give it another look.  It might need more polish.

Thanks for committing.  I have read through the commit and I am not
noticing any issue sorting out.  One thing may be to give a short
description for some of the routine's arguments like
check_mode_recursive, but I think that we could live without that
either.
--
Michael


signature.asc
Description: PGP signature


Re: SIGQUIT on archiver child processes maybe not such a hot idea?

2019-09-02 Thread Kyotaro Horiguchi
At Tue, 3 Sep 2019 02:42:19 +, "Tsunakawa, Takayuki" 
 wrote in 
<0A3221C70F24FB45833433255569204D1FD1609C@G01JPEXMBYT05>
> From: Kyotaro Horiguchi [mailto:horikyota@gmail.com]
> > Since we are allowing OPs to use arbitrary command as
> > archive_command, providing a replacement with non-standard signal
> > handling for a specific command doesn't seem a general solution
> > to me. Couldn't we have pg_system(a tentative name), which
> > intercepts SIGQUIT then sends SIGINT to children? Might be need
> > to resend SIGQUIT after some interval, though..
> 
> The same idea that you referred to as pg_system occurred to me,
> too.  But I wondered if the archiver process can get the pid of
> its child (shell? archive_command?), while keeping the
> capabilities of system() (= the shell).  Even if we fork() and
> then system(), doesn't the OS send SIGQUIT to any descendents

I imagnined that we don't use system(3) at all. It would be
almost simplified version of system(3), but doesn't use
it.

> of the archiver when postmaster sends SIGQUIT to the child
> process group?

Mmmm.. That's bad. The commit 3ad0728c81 intends to send SIGQUIT
to the programs under system() for archiver process. I found the
discusion for the patch.

https://www.postgresql.org/message-id/flat/24849.1163826231%40sss.pgh.pa.us#c987fc18a4b9113c6276ae8b5d85abba

> Reimplementing system() seems pretty ugly, but maybe we have no choice.
> It strikes me that system() has a race condition as defined anyway,
> because if a signal arrives between blocking the handler and issuing the
> fork(), it'll disappear into the ether; and the same at the end of the
> routine.
..
>> Alternatively perhaps Postgres really ought to be using USR1/USR2 or other
>> signals that library routines won't think they have any business rearranging.
>
> The existing signal assignments were all picked for what seem to me
> to be good reasons; I'm disinclined to change them.  In any case, the
> important point here is that we'd really like an archive or recovery
> script, or for that matter any command executed via system() from a
> backend, to abort when the parent backend is SIGINT'd or SIGQUIT'd.

But now we know that sending it to grand-children is wrong in a
sense that that leads to left-alone unwanted core files. But the
behavior is already knwon at the time.

So, Now I know that we need to revert that in certain extent if
we want to stop the core-dumping behavior...


I'm inclined to think that we should just document that behavior..


> > # Is there any means to view the whole of a thread from archive?
> > # I'm a kind of reluctant to wander among messages like a rat in
> > # a maze:p
> 
> You can see the whole thread by clicking the "Whole Thread" link.

What??? Oops!!! I found it among "Whole Mssage", "Download mbox"
and "Resend email". Sorry for my slipperly eyes and thanks!

regrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: psql - improve test coverage from 41% to 88%

2019-09-02 Thread Fabien COELHO


Attached is a rebase after TestLib.pm got a documentation in 6fcc40b1.

The attached patch improves psql code coverage by adding a specific TAP test. 
The 1709 tests take 4 seconds CPU (6.3 elapsed time) on my laptop.


The infrastructure is updated to require perl module "Expect", allowing to 
test interactive features such as tab completion and prompt changes.


Coverage in "src/bin/psql" jumps from 40.0% of lines and 41.9% of functions 
to 78.4% of lines and 98.1% of functions with "check-world".


--
Fabien.diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b7d36b65dd..aabc27ab6f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -506,6 +506,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*checksum verification failed/s],
 	'pg_basebackup reports checksum mismatch');
@@ -526,6 +527,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*further.*failures.*will.not.be.reported/s],
 	'pg_basebackup does not report more than 5 checksum mismatches');
@@ -542,6 +544,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
 $node->command_checks_all(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ],
 	1,
+	'',
 	[qr{^$}],
 	[qr/^WARNING.*7 total checksum verification failures/s],
 	'pg_basebackup correctly report the total number of checksum mismatches');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 59228b916c..cf4811d382 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -62,6 +62,7 @@ sub check_relation_corruption
 			'--filenode',   $relfilenode_corrupted
 		],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data for single relfilenode on tablespace $tablespace"
@@ -71,6 +72,7 @@ sub check_relation_corruption
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/Bad checksums:.*1/],
 		[qr/checksum verification failed/],
 		"fails with corrupted data on tablespace $tablespace");
@@ -203,6 +205,7 @@ sub fail_corrupt
 	$node->command_checks_all(
 		[ 'pg_checksums', '--check', '-D', $pgdata ],
 		1,
+		'',
 		[qr/^$/],
 		[qr/could not read block 0 in file.*$file\":/],
 		"fails for corrupted data in $file");
diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl
index 3b63ad230f..ebe9b80a52 100644
--- a/src/bin/pg_controldata/t/001_pg_controldata.pl
+++ b/src/bin/pg_controldata/t/001_pg_controldata.pl
@@ -33,6 +33,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_controldata', $node->data_dir ],
 	0,
+	'',
 	[
 		qr/WARNING: Calculated CRC checksum does not match value stored in file/,
 		qr/WARNING: invalid WAL segment size/
diff --git a/src/bin/pg_resetwal/t/002_corrupted.pl b/src/bin/pg_resetwal/t/002_corrupted.pl
index f9940d7fc5..1990669d26 100644
--- a/src/bin/pg_resetwal/t/002_corrupted.pl
+++ b/src/bin/pg_resetwal/t/002_corrupted.pl
@@ -30,6 +30,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it/
@@ -46,6 +47,7 @@ close $fh;
 command_checks_all(
 	[ 'pg_resetwal', '-n', $node->data_dir ],
 	0,
+	'',
 	[qr/pg_control version number/],
 	[
 		qr/\Qpg_resetwal: warning: pg_control specifies invalid WAL segment size (0 bytes); proceed with caution\E/
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..01010914fe 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -50,7 +50,7 @@ sub pgbench
 
 	push @cmd, @args;
 
-	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
+	$node->command_checks_all(\@cmd, $stat, '', $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..b58f3548c3 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -25,7 +25,7 @@ sub pgbench
 	my ($opts, $stat, $out, $err, $name) = @_;
 	print STDERR "opts=$opts, stat=$stat, out=$out, err=$err, name=$name";
 	command_checks_all([ 'pgbench', split(/\s+/, $opts) ],
-		$stat, $out, $err, $name);
+		$stat, '', $out, $err, $name);
 	return;
 }
 
@@ -52,7 +52,7 @@ sub pgbench_scripts
 			push @cmd, '-f', $filename;
 		}
 	}
-	command_checks_all(\@cmd, $stat, $out, $err, $name);
+	command_checks_all(\@cmd, $stat

Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Michael Paquier
On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.

That's one issue.

> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

Possibly.  I don't dislike the idea of piling up the progress
information for cascading calls and I would use that with a depth
counter and a fixed-size array.  

> BTW the following condition in pgstat_progress_end_command() seems to
> be wrong. We should return from the function when either
> pgstat_track_activities is disabled or the current progress command is
> invalid.
> 
>if (!pgstat_track_activities
>   && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
>   return;
> 
> I've attached the patch fixes the issue I newly found.

Indeed, good catch.  This is wrong since b6fb647 which has introduced
the progress reports.  I'll fix that one and back-patch if there are
no objections.

With my RMT hat on for v12, I don't think that it is really the moment
to discuss how we want to change this API post beta3, and we have room
for that in future development cycles.  There are quite some questions
which need answers I am unsure of:
- Do we really want to support nested calls of progress reports for
multiple command?
- Perhaps for some commands it makes sense to have an overlap of the
fields used, but we need a clear definition of what can be done or
not.  I am not really comfortable with the idea of having in
reindex_relation() a progress report related only to CLUSTER, which is
also a REINDEX code path.  The semantics shared between both commands
need to be thought a bit more.  For example
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to
report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command
type, which would be just wrong for a CLUSTER command.
- Which command should be reported to the user, only the upper-level
one?
- Perhaps we can live only with the approach of not registering a new
command if one already exists, and actually be more flexible with the
phase fields used, in short we use unique numbers for each phase?

The most conservative bet from a release point of view, and actually
my bet because that's safe, would be to basically revert 6f97457
(CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has
overlaps with REINDEX in the build and validation paths).  What I am
really scared of is that we have just barely scratched the surface of
the issues caused by the inter-dependencies between all the code
paths of those commands, and that we have much more waiting behind
this open item.
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2019-09-02 Thread Erik Rijkers

On 2019-09-03 05:32, Euler Taveira wrote:
Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin  
escreveu:


There are complaints in the log (both pub and sub) like:
ERROR: trying to store a heap tuple into wrong type of slot

I have no idea what causes that.

Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 
955 of 0005-Row-filtering-for-logical-replication.patch it should be 
&TTSOpsHeapTuple instead of &TTSOpsVirtual.


Ops... exact. That was an oversight while poking with different types 
of slots.


OK, I'll consider Alexey Kondratov's set of patches as the current 
state-of-the-art then.  (They still apply.)


I found a problem where I'm not sure it's a bug:

The attached bash script does a test by setting up pgbench tables on 
both master and replica, and then sets up logical replication for a 
slice of pgbench_accounts. Then it does a short pgbench run, and loops 
until the results become identical(ok) (or breaks out after a certain 
time (NOK=not ok)).


It turns out this did not work until I added a wait state after the 
CREATE SUBSCRIPTION.  It always fails without the wait state, and always 
works with the wait state.


Do you agree this is a bug?


thanks (also to both Alexeys :))


Erik Rijkers


PS
by the way, this script won't run as-is on other machines; it has stuff 
particular to my local setup.



#!/bin/bash

# postgres binary compiled with 
#
# pgpatches/0130/logrep_rowfilter/20190902/v2-0001-Remove-unused-atttypmod-column-from-initial-table.patch
# pgpatches/0130/logrep_rowfilter/20190902/v2-0002-Store-number-of-tuples-in-WalRcvExecResult.patch   
# pgpatches/0130/logrep_rowfilter/20190902/v2-0003-Refactor-function-create_estate_for_relation.patch     
# pgpatches/0130/logrep_rowfilter/20190902/v2-0004-Rename-a-WHERE-node.patch      
# pgpatches/0130/logrep_rowfilter/20190902/v2-0005-Row-filtering-for-logical-replication.patch    
# pgpatches/0130/logrep_rowfilter/20190902/v2-0006-Print-publication-WHERE-condition-in-psql.patch    
# pgpatches/0130/logrep_rowfilter/20190902/v2-0007-Publication-where-condition-support-for-pg_dump.patch  
# pgpatches/0130/logrep_rowfilter/20190902/v2-0008-Debug-for-row-filtering.patch      
# pgpatches/0130/logrep_rowfilter/20190902/v2-0009-Add-simple-BDR-test-for-row-filtering.patch
 

unset PGDATABASE PGPORT PGSERVICE
export PGDATABASE=postgres

root_dir=/tmp/cascade/logrep_rowfilter

mkdir -p $root_dir

BIN=$HOME/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin

export PATH=$BIN:$PATH

  initdb=$BIN/initdb
postgres=$BIN/postgres
  pg_ctl=$BIN/pg_ctl
baseport=6525
   port1=$(( $baseport + 0 )) 
   port2=$(( $baseport + 1 ))
 appname=rowfilter

num_instances=2
scale=1  where="where (aid between 4 and 5-1)"
 #  scale=10 where="where (aid between 40 and 40+5-1)"
  clients=64
 duration=20
 wait=10  BASTA_COUNT=40  #   7200   #  wait seconds in total  

if [[ -d $root_dir/instance1 ]]; then rm -rf $root_dir/instance1; fi
if [[ -d $root_dir/instance2 ]]; then rm -rf $root_dir/instance2; fi
if [[ -d $root_dir/instance1 ]]; then exit ; fi
if [[ -d $root_dir/instance2 ]]; then exit ; fi

devel_file=/tmp/bugs
echo filterbug>$devel_file

for n in `seq 1 $num_instances`
do
  instance=instance$n
  server_dir=$root_dir/$instance
  data_dir=$server_dir/data
  port=$(( $baseport + $n -1 ))
  logfile=$server_dir/logfile.$port
  echo "-- $initdb --pgdata=$data_dir --encoding=UTF8 --pwfile=$devel_file "
   $initdb --pgdata=$data_dir --encoding=UTF8 --pwfile=$devel_file  &> /dev/null 
  ( $postgres  -D $data_dir -p $port \
--wal_level=logical --logging_collector=on \
--client_min_messages=warning \
--log_directory=$server_dir --log_filename=logfile.${port} \
--log_replication_commands=on & ) &> /dev/null
done 

echo "sleep 3s"
sleep 3

echo "
  drop table if exists pgbench_accounts;
  drop table if exists pgbench_branches;
  drop table if exists pgbench_tellers;
  drop table if exists pgbench_history;" | psql -qXp $port1 \
&& echo "
  drop table if exists pgbench_accounts;
  drop table if exists pgbench_branches;
  drop table if exists pgbench_tellers;
  drop table if exists pgbench_history;" | psql -qXp $port2 \
&& pgbench -p $port1 -qis $scale \
&& echo "alter table pgbench_history add column hid serial primary key;" \
  | psql -q1Xp $port1 && pg_dump -F c -p $port1 \
 --exclude-table-data=pgbench_history  \
 --exclude-table-data=pgbench_accounts \
 --exclude-table-data=pgbench_branches \
 --exclude-table-data=pgbench_tellers  \
   -t pgbench_history -t pgbench_accounts \
   -t pgbench_branches -t pgbench_tellers \
  | pg_restore -1 -p $port2 -d postgres


pub1=pub_${port1}_to_${port2}
sub1=sub_${port2}_fr_${port1}
echo -ne "
cr

Re: SIGQUIT on archiver child processes maybe not such a hot idea?

2019-09-02 Thread Thomas Munro
On Tue, Sep 3, 2019 at 2:43 PM Tsunakawa, Takayuki
 wrote:
> From: Kyotaro Horiguchi [mailto:horikyota@gmail.com]
> > Since we are allowing OPs to use arbitrary command as
> > archive_command, providing a replacement with non-standard signal
> > handling for a specific command doesn't seem a general solution
> > to me. Couldn't we have pg_system(a tentative name), which
> > intercepts SIGQUIT then sends SIGINT to children? Might be need
> > to resend SIGQUIT after some interval, though..
>
> The same idea that you referred to as pg_system occurred to me, too.  But I 
> wondered if the archiver process can get the pid of its child (shell? 
> archive_command?), while keeping the capabilities of system() (= the shell).  
> Even if we fork() and then system(), doesn't the OS send SIGQUIT to any 
> descendents of the archiver when postmaster sends SIGQUIT to the child 
> process group?

So, to recap what's happening here, we have a tree of processes like this:

postmaster
-> archiver
   -> sh
  -> cp [user-supplied archiving command]

The archiver is a process group leader, because it called setsid().
The postmaster's signal_child() does kill(pid, ...) and also
kill(-pid, ...), so the kernel sends SIGQUIT to archiver (twice), sh
and cp.  As for what they do with the signal, it depends on timing:

1.  The archiver normally exits immediately in pgarch_exit(), but
while system() is running, SIGQUIT and SIGINT are ignored (see POSIX).
2.  sh normally uses SIGINT to break out of loops etc, but while it's
waiting for a subprocess, it also ignores SIGQUIT and SIGINT (see
POSIX).
3.  cp inherits the default disposition and (unless it handles it
specially) dumps core.

I think the general idea here is that interactive shells and similar
things want to ignore signals from users typing ^C (SIGINT) or ^\
(SIGQUIT) so they can affect  just the thing that's actually running
at this moment, not the tree of processes waiting.

Yeah, I guess we could have our own pg_system() function that does
roughly what system() does, namely fork(), then execl() in the child
and waitpid() in the parent, but the child could begin a new process
group with setsid() before running execl() (so that it no longer gets
SIGQUIT with the postmaster signals the archiver), and the parent
could record pg_system_child_pid when forking, and install a QUIT
handler that does kill(-pg_system_child_pid, SIGTERM), as well as
setting a flag that will cause its main loop to exit (but not before
it has run waitpid()).  With some carefully placed blocks and unblocks
and ignores, to avoid races.

That all sounds like a lot of work though, and it might be easier to
just make an exception and use SIGTERM to shut down the archiver, as I
think Tom was suggesting.  Unfortunately we have the same problem
elsewhere, where we use popen().  I just wrote a C program that does
just "sleep(60)", ran it with COPY FROM PROGRAM, then sent SIGQUIT to
the postmaster, and got a dumped core.

-- 
Thomas Munro
https://enterprisedb.com




RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-09-02 Thread Tsunakawa, Takayuki
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
> Hmm ... is this patch rejected, or is somebody still trying to get it to
> committable state?  David, you're listed as committer.

I don't think it's rejected.  It would be a pity (mottainai) to refuse this, 
because it provides significant speedup despite its simple modification.

Again, I think the v2 patch is OK.  Tom's comment was as follows:


[Tom's comment against v2]

FWIW, I tried this patch against current HEAD (959d00e9d).
Using the test case described by Amit at

I do measure an undeniable speedup, close to 35%.

However ... I submit that that's a mighty extreme test case.
(I had to increase max_locks_per_transaction to get it to run
at all.)  We should not be using that sort of edge case to drive
performance optimization choices.

If I reduce the number of partitions in Amit's example from 8192
to something more real-world, like 128, I do still measure a
performance gain, but it's ~ 1.5% which is below what I'd consider
a reproducible win.  I'm accustomed to seeing changes up to 2%
in narrow benchmarks like this one, even when "nothing changes"
except unrelated code.

Trying a standard pgbench test case (pgbench -M prepared -S with
one client and an -s 10 database), it seems that the patch is about
0.5% slower than HEAD.  Again, that's below the noise threshold,
but it's not promising for the net effects of this patch on workloads
that aren't specifically about large and prunable partition sets.

I'm also fairly concerned about the effects of the patch on
sizeof(LOCALLOCK) --- on a 64-bit machine it goes from 72 to 88
bytes, a 22% increase.  That's a lot if you're considering cases
with many locks.

On the whole I don't think there's an adequate case for committing
this patch.

I'd also point out that this is hardly the only place where we've
seen hash_seq_search on nearly-empty hash tables become a bottleneck.
So I'm not thrilled about attacking that with one-table-at-time patches.
I'd rather see us do something to let hash_seq_search win across
the board.



* Extreme test case: 
Not extreme.  Two of our customers, who are considering to use PostgreSQL, are 
using thousands of partitions now.  We hit this issue -- a point query gets 
nearly 20% slower after automatically creating a generic plan.  That's the 
reason for this proposal.

* 0.5% slowdown with pgbench:
I think it's below the noise, as Tom said.

* sizeof(LOCALLOCK):
As Andres replied to Tom in the immediately following mail, LOCALLOCK was 
bigger in PG 11.

* Use case is narrow:
No.  The bloated LockMethodLocalHash affects the performance of the items below 
as well as transaction commit/abort:
  - AtPrepare_Locks() and PostPrepare_Locks(): the hash table is scanned twice 
in PREPARE!
  - LockReleaseSession: advisory lock
  - LockReleaseCurrentOwner: ??
  - LockReassignCurrentOwner: ??


Regards
Takayuki Tsunakawa



Re: row filtering for logical replication

2019-09-02 Thread Alexey Zagarin
> There are complaints in the log (both pub and sub) like:
> ERROR: trying to store a heap tuple into wrong type of slot
>
> I have no idea what causes that.

Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of 
0005-Row-filtering-for-logical-replication.patch it should be &TTSOpsHeapTuple 
instead of &TTSOpsVirtual.




Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Masahiko Sawada
On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada  wrote:
>
> On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier  wrote:
> >
> > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > > "initializing" phase again and other columns were empty. So, it is
> > > > not suitable to fix the problem. :(
> > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > > time to do that after PGConf.Asia 2019.
> > > > If we selected 3., it affects following these progress reporting
> > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > > I suppose. Any comments welcome! :)
> > >
> > > I looked at this open item. I prefer #3 but I think it would be enough
> > > to have a small stack using a fixed length array to track nested
> > > progress information and the current executing command (maybe 4 or 8
> > > would be enough for maximum nested level for now?). That way, we don't
> > > need any change the interfaces. AFAICS there is no case where we call
> > > only either pgstat_progress_start_command or
> > > pgstat_progress_end_command without each other (although
> > > pgstat_progress_update_param is called without start). So I think that
> > > having a small stack for tracking multiple reports would work.
> > >
> > > Attached the draft patch that fixes this issue. Please review it.
> >
> > Do we actually want to show to the user information about CREATE INDEX
> > which is different than CLUSTER?  It could be confusing for the user
> > to see a progress report from a command different than the one
> > actually launched.
>
> I personally think it would be helpful for users. We provide the
> progress reporting for each commands but it might not be detailed
> enough. But we can provide more details of progress information of
> each commands by combining them. Only users who want to confirm the
> details need to see different progress reports.
>
> > There could be a method 4 here: do not start a new
> > command progress when there is another one already started, and do not
> > try to end it in the code path where it could not be started as it did
> > not stack.  So while I see the advantages of stacking the progress
> > records as you do when doing cascading calls of the progress
> > reporting, I am not sure that:
> > 1) We should bloat more PgBackendStatus for that.
> > 2) We want to add more complication in this area close to the
> > release of 12.
>
> I agreed, especially to 2. We can live with #4 method in PG12 and the
> patch I proposed could be discussed as a new feature.

After more thought, even if we don't start a new command progress when
there is another one already started the progress update functions
could be called and these functions don't specify the command type.
Therefore, the progress information might be changed with wrong value
by different command. Probably we can change the caller of progress
updating function so that it doesn't call all of them if the command
could not start a new progress report, but it might be a big change.

As an alternative idea, we can make pgstat_progress_end_command() have
one argument that is command the caller wants to end. That is, we
don't end the command progress when the specified command type doesn't
match to the command type of current running command progress. We
unconditionally clear the progress information at CommitTransaction()
and AbortTransaction() but we can do that by passing
PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

BTW the following condition in pgstat_progress_end_command() seems to
be wrong. We should return from the function when either
pgstat_track_activities is disabled or the current progress command is
invalid.

   if (!pgstat_track_activities
  && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
  return;

I've attached the patch fixes the issue I newly found.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_progress_end_command.patch
Description: Binary data


Re: pgbench - rework variable management

2019-09-02 Thread Fabien COELHO




Some windows-specific hacks are note tested.


Somehow this macro hackery has upset the Windows socket headers:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55019


I noticed, but I do not have any windows host so I cannot test locally.

The issue is how to do a mutex on Windows, which does not have pthread so 
it has to be emulated. I'll try again by sending a blind update to the 
patch and see how it goes.


--
Fabien.




Re: row filtering for logical replication

2019-09-02 Thread Euler Taveira
Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin  escreveu:
>
> There are complaints in the log (both pub and sub) like:
> ERROR: trying to store a heap tuple into wrong type of slot
>
> I have no idea what causes that.
>
>
> Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of 
> 0005-Row-filtering-for-logical-replication.patch it should be 
> &TTSOpsHeapTuple instead of &TTSOpsVirtual.
>
Ops... exact. That was an oversight while poking with different types of slots.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




RE: SIGQUIT on archiver child processes maybe not such a hot idea?

2019-09-02 Thread Tsunakawa, Takayuki
From: Kyotaro Horiguchi [mailto:horikyota@gmail.com]
> Since we are allowing OPs to use arbitrary command as
> archive_command, providing a replacement with non-standard signal
> handling for a specific command doesn't seem a general solution
> to me. Couldn't we have pg_system(a tentative name), which
> intercepts SIGQUIT then sends SIGINT to children? Might be need
> to resend SIGQUIT after some interval, though..

The same idea that you referred to as pg_system occurred to me, too.  But I 
wondered if the archiver process can get the pid of its child (shell? 
archive_command?), while keeping the capabilities of system() (= the shell).  
Even if we fork() and then system(), doesn't the OS send SIGQUIT to any 
descendents of the archiver when postmaster sends SIGQUIT to the child process 
group?


> # Is there any means to view the whole of a thread from archive?
> # I'm a kind of reluctant to wander among messages like a rat in
> # a maze:p

You can see the whole thread by clicking the "Whole Thread" link.


Regards
Takayuki Tsunakawa





Re: Converting NOT IN to anti-joins during planning

2019-09-02 Thread David Rowley
On Tue, 3 Sep 2019 at 09:21, Alvaro Herrera  wrote:
> David, will we hear from you on this patch during this month?
> It sounds from Antonin's review that it needs a few changes.

Thanks for checking.  I'm currently quite committed with things away
from the community and it's unlikely I'll get to this in September.
I'll kick it out to the next 'fest now.

 Antonin, Thank you for the review. I will respond to it when I get time again.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 05:38:56PM +0900, Michael Paquier wrote:
> Thinking wider, don't we have the same problem with wal_sender_timeout
> in the case where a sync request takes longer than the time it would
> take the backend to terminate the connection?

I have been able to work more on that, and that can indeed happen with
wal_sender_timeout.  While reviewing the code, I have noticed that
there is little point to enable do_sync when fetching WAL segments.
This actually led to too many fsyncs done for the plain format as each
WAL segment is fsync'd first by walmethods.c, then fsync'd again by
fsync_pgdata() in pg_wal/.

Attached is an updated patch, which needs to go down to v10.
--
Michael
From ec75a75514d9601d1b1c3b929ac1eb870a208aa2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 3 Sep 2019 10:48:45 +0900
Subject: [PATCH v2] Delay fsync of pg_basebackup tar format until the end

Since the addition of fsync requests in bc34223 to make base backup data
consistent on disk once pg_basebackup finishes, each tablespace tar file
is individually flushed once completed, with an additional flush of the
parent directory when the base backup finishes.  While holding a
connection to the server, a fsync request taking a long time may cause a
failure of the base backup.  A recent example is tcp_user_timeout, but
wal_sender_timeout can cause similar problems.

While reviewing the code, there was a second issue causing too many
fsync requests to be done when completing WAL records for the tar or
plain format. As recursive fsyncs are done at the end for both formats
for everything written, it is fine to disable fsync when fetching or
streaming WAL files.

Reported-by: Ryohei Takahashi
Author: Michael Paquier
Reviewed-by: Ryohei Takahashi
Discussion: https://postgr.es/m/osbpr01mb4550dae2f8c9502894a45aab82...@osbpr01mb4550.jpnprd01.prod.outlook.com
Backpatch-through: 10
---
 src/bin/pg_basebackup/pg_basebackup.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9207109ba3..da475319b6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -486,15 +486,18 @@ LogStreamerMain(logstreamer_param *param)
 #endif
 	stream.standby_message_timeout = standby_message_timeout;
 	stream.synchronous = false;
-	stream.do_sync = do_sync;
+	/* fsync happens at the end of pg_basebackup for all data */
+	stream.do_sync = false;
 	stream.mark_done = true;
 	stream.partial_suffix = NULL;
 	stream.replication_slot = replication_slot;
 
 	if (format == 'p')
-		stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0, do_sync);
+		stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+	stream.do_sync);
 	else
-		stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel, do_sync);
+		stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel,
+			  stream.do_sync);
 
 	if (!ReceiveXlogStream(param->bgconn, &stream))
 
@@ -1346,9 +1349,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	if (copybuf != NULL)
 		PQfreemem(copybuf);
 
-	/* sync the resulting tar file, errors are not considered fatal */
-	if (do_sync && strcmp(basedir, "-") != 0)
-		(void) fsync_fname(filename, false);
+	/*
+	 * Do not sync the resulting tar file yet, all files are synced once at
+	 * the end.
+	 */
 }
 
 
@@ -2138,9 +2142,9 @@ BaseBackup(void)
 
 	/*
 	 * Make data persistent on disk once backup is completed. For tar format
-	 * once syncing the parent directory is fine, each tar file created per
-	 * tablespace has been already synced. In plain format, all the data of
-	 * the base directory is synced, taking into account all the tablespaces.
+	 * sync the parent directory and all its contents as each tar file was not
+	 * synced after being completed.  In plain format, all the data of the
+	 * base directory is synced, taking into account all the tablespaces.
 	 * Errors are not considered fatal.
 	 */
 	if (do_sync)
@@ -2150,7 +2154,7 @@ BaseBackup(void)
 		if (format == 't')
 		{
 			if (strcmp(basedir, "-") != 0)
-(void) fsync_fname(basedir, true);
+(void) fsync_dir_recurse(basedir);
 		}
 		else
 		{
-- 
2.23.0



signature.asc
Description: PGP signature


Re: pg_waldump and PREPARE

2019-09-02 Thread Fujii Masao
On Tue, Sep 3, 2019 at 3:04 AM Alvaro Herrera  wrote:
>
> On 2019-Aug-01, Michael Paquier wrote:
>
> > I may be missing something of course, but in this case we argued about
> > adding a couple of more fields.  In consequence, the patch should be
> > waiting on its author, no?
>
> Fujii,
>
> Are you in a position to submit an updated version of this patch?

Sorry for the long delay... Yes, I will update the patch if necessary.

Regards,

-- 
Fujii Masao




Re: Avoid full GIN index scan when possible

2019-09-02 Thread Alvaro Herrera
On 2019-Aug-07, Tom Lane wrote:

> I think this would be committable as it stands, except that replacing
> an ALL scan with an EVERYTHING scan could be a performance regression
> if the index contains many null items.  We need to do something about
> that before committing.

Nikita, any word on getting this change done?

> Unfortunately I'm not sold on either 0002 or 0003 as they stand;
> they seem overly complicated, I'm not convinced they're correct,
> and you haven't really provided examples showing that all this
> extra complexity is worthwhile.

I suppose we should call ourselves satisfied if we get 0001 done during
this cycle (or at least this commitfest).  Further refinement can be had
in the future, as needed -- even within pg13, if Nikita or anybody else
wants to tackle Tom's suggested approaches (or something completely new,
or just contest Tom's points) quickly enough.  But I don't think we need
that in order to call this CF entry committed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_waldump and PREPARE

2019-09-02 Thread Fujii Masao
Sorry for the long delay...

On Thu, Jul 4, 2019 at 5:25 PM Julien Rouhaud  wrote:
>
> On Thu, Jul 4, 2019 at 9:45 AM Michael Paquier  wrote:
> >
> > On Wed, Jul 03, 2019 at 08:23:44PM +0200, Julien Rouhaud wrote:
> > > So the patch compiles and works as intended. I don't have much to say
> > > about it, it all looks good to me, since the concerns about xactdesc.c
> > > aren't worth the trouble.
> > >
> > > I'm not sure that I understand Michael's objection though, as
> > > xl_xact_prepare is not a new definition and AFAICS it couldn't contain
> > > the records anyway.  So I'll let him say if he has further objections
> > > or if it's ready for committer!
> >
> > This patch provides parsing information only for the header of the 2PC
> > record.  Wouldn't it be interesting to get more information from the
> > various TwoPhaseRecordOnDisk's callbacks?  We could also print much
> > more information in xact_desc_prepare().  Like the subxacts, the XID,
> > the invalidation messages and the delete-on-abort/commit rels.
>
> Most of those are already described in the COMMIT PREPARE message,
> wouldn't that be redundant?  abortrels aren't displayed anywhere
> though, so +1 for adding them.

xact_desc_abort() for ROLLBACK PREPARED describes abortrels. No?

> I also see that the dbid isn't displayed in any of the 2PC message,
> that'd be useful to have it directly instead of looking for it in
> other messages for the same transaction.

dbid is not reported even in COMMIT message. So I don't like adding
dbid into only the PREPARE message.

Regards,

-- 
Fujii Masao




Re: Hypothetical indexes using BRIN broken since pg10

2019-09-02 Thread Tom Lane
Julien Rouhaud  writes:
> On Fri, Jul 26, 2019 at 1:34 PM Heikki Linnakangas  wrote:
>> The patch assumes the default pages_per_range setting, but looking at
>> the code at https://github.com/HypoPG/hypopg, the extension actually
>> takes pages_per_range into account when it estimates the index size. I
>> guess there's no easy way to pass the pages_per_range setting down to
>> brincostestimate(). I'm not sure what we should do about that, but seems
>> that just using BRIN_DEFAULT_PAGES_PER_RANGE here is not very accurate.

> Yes, hypopg can use a custom pages_per_range as it intercepts it when
> the hypothetical index is created.  I didn't find any way to get that
> information in brincostestimate(), especially for something that could
> backpatched.  I don't like it, but I don't see how to do better than
> just using BRIN_DEFAULT_PAGES_PER_RANGE :(

I can tell you what I think ought to happen, but making it happen might
be more work than this patch should take on.

The right answer IMO is basically for the brinGetStats call to go
away from brincostestimate and instead happen during plancat.c's
building of the IndexOptInfo.  In the case of a hypothetical index,
it'd fall to the get_relation_info_hook to fill in suitable fake
data.  Sounds simple, but:

1. We really don't want even more AM-specific knowledge in plancat.c.
So I think the right way to do this would be something along the
line of adding a "void *amdata" field to IndexOptInfo, and adding
an AM callback to be called during get_relation_info that's allowed
to fill that in with some AM-specific data (which the AM's costestimate
routine would know about).  The existing btree-specific hacks in
get_relation_info should migrate into btree's version of this callback,
and IndexOptInfo.tree_height should probably go away in favor of
keeping that in btree's version of the amdata struct.

2. This approach puts a premium on the get_relation_info callback
being cheap, because there's no certainty that the data it fills
into IndexOptInfo.amdata will ever get used.  For btree, the 
_bt_getrootheight call is cheap enough to not be a problem, because
it just looks at the metapage data that btree keeps cached in the
index's relcache entry.  The same cannot be said for brinGetStats
as it stands: it goes off to read the index metapage.  There are
at least two ways to fix that:

2a. Teach brin to keep the metapage cached like btree does.
This seems like it could be a performance win across the board,
but you'd need to work out invalidation behavior, and it'd be
a bit invasive.

2b. Define IndexOptInfo.amdata as being filled lazily, that is
brincostestimate will invoke brinGetStats and fill in the data
if the pointer is currently NULL.  Then a hypothetical-index
plugin could override that by pre-filling the field with the
desired fake data.

I don't have a problem with allowing brincostestimate to fill
in defaults based on BRIN_DEFAULT_PAGES_PER_RANGE if it sees
that amdata is null and the index is hypothetical.  But there
should be a way for the get_relation_info_hook to do better.

BTW, the current patch doesn't apply according to the cfbot,
but I think it just needs a trivial rebase over 9c703c169
(ie, assume the index is already locked).  I didn't bother
to do that since what I recommend above would require a
lot more change in that area.

regards, tom lane




Re: pgbench - rework variable management

2019-09-02 Thread Thomas Munro
On Wed, Aug 14, 2019 at 3:54 AM Fabien COELHO  wrote:
> Some windows-specific hacks are note tested.

Somehow this macro hackery has upset the Windows socket headers:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55019

-- 
Thomas Munro
https://enterprisedb.com




Re: Unix-domain socket support on Windows

2019-09-02 Thread Thomas Munro
On Wed, Aug 14, 2019 at 6:27 AM Peter Eisentraut
 wrote:
> This patch set needs testers with various Windows versions to test
> different configurations, combinations, and versions.

It's failing to build on cfbot's AppVeyor setup[1].  That's currently
using Windows SDK 7.1, so too old for the new AF_UNIX sockets, but
presumably something is wrong because it shouldn't fail to compile and
link.

src/interfaces/libpq/fe-connect.c(2682): warning C4101: 'pwdbuf' :
unreferenced local variable [C:\projects\postgresql\libpq.vcxproj]
src/interfaces/libpq/fe-connect.c(2687): warning C4101: 'passerr' :
unreferenced local variable [C:\projects\postgresql\libpq.vcxproj]

fe-connect.obj : error LNK2019: unresolved external symbol getpeereid
referenced in function PQconnectPoll
[C:\projects\postgresql\libpq.vcxproj]

[1] 
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55034?fullLog=true

-- 
Thomas Munro
https://enterprisedb.com




Re: [HACKERS] logical decoding of two-phase transactions

2019-09-02 Thread David Steele

On 9/2/19 6:12 PM, Alvaro Herrera wrote:

I don't understand why this patch record has been kept aliv for so long,
since no new version has been sent in ages.  If this patch is really
waiting on the author, let's see the author do something.  If no voice
is heard very soon, I'll close this patch as RwF.


+1.  I should have marked this RWF in March but I ignored it because it 
was tagged v13 before the CF started.


--
-David
da...@pgmasters.net




Re: pg_upgrade version checking questions

2019-09-02 Thread Daniel Gustafsson
> On 2 Sep 2019, at 19:59, Alvaro Herrera  wrote:
> 
> On 2019-Jul-30, Daniel Gustafsson wrote:
> 
>> I’ll take a stab at tidying all of this up to require less duplication, we’ll
>> see where that ends up.
> 
> Hello Daniel, are you submitting a new version soon?

I am working on an updated version which unfortunately got a bit delayed, but
will be submitted shortly (targeting this week).

cheers ./daniel



Re: [PATCH] kNN for btree

2019-09-02 Thread Alvaro Herrera
On 2019-Sep-03, Alexander Korotkov wrote:

> I think patches 0001-0008 are very clear and extends our index-AM
> infrastructure in query straightforward way.  I'm going to propose
> them for commit after some further polishing.

Hmm.  Why is 0001 needed?  I see that 0005 introduces a call to that
function, but if attnum == 0 then it doesn't call it.  Maybe it was
necessary in an older version of the patch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] kNN for btree

2019-09-02 Thread Alexander Korotkov
On Mon, Sep 2, 2019 at 9:11 PM Alvaro Herrera  wrote:
>
> On 2019-Jul-12, Nikita Glukhov wrote:
>
> > Attached 13th version of the patches.
>
> Hello Nikita,
>
> Can you please rebase this again?

Nikita is on vacation now.  Rebased patchset is attached.

I think patches 0001-0008 are very clear and extends our index-AM
infrastructure in query straightforward way.  I'm going to propose
them for commit after some further polishing.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0004-Extract-structure-BTScanState-v14.patch.gz
Description: GNU Zip compressed data


0003-Enable-ORDER-BY-operator-scans-on-ordered-indexe-v14.patch.gz
Description: GNU Zip compressed data


0005-Add-kNN-support-to-btree-v14.patch.gz
Description: GNU Zip compressed data


0002-Introduce-ammorderbyopfirstcol-v14.patch.gz
Description: GNU Zip compressed data


0001-Fix-get_index_column_opclass-v14.patch.gz
Description: GNU Zip compressed data


0006-Add-btree-distance-operators-v14.patch.gz
Description: GNU Zip compressed data


0007-Remove-distance-operators-from-btree_gist-v14.patch.gz
Description: GNU Zip compressed data


0009-Introduce-ammatchorderby-function-v14.patch.gz
Description: GNU Zip compressed data


0008-Add-regression-tests-for-kNN-btree-v14.patch.gz
Description: GNU Zip compressed data


0010-Add-btree-support-of-kNN-on-non-first-column-v14.patch.gz
Description: GNU Zip compressed data


0011-Allow-ammatchorderby-to-return-pathkey-sublists-v14.patch.gz
Description: GNU Zip compressed data


0012-Add-support-of-array-ops-to-btree-kNN-v14.patch.gz
Description: GNU Zip compressed data


Re: row filtering for logical replication

2019-09-02 Thread Erik Rijkers

On 2019-09-02 01:43, Euler Taveira wrote:
Em dom, 1 de set de 2019 às 06:09, Erik Rijkers  
escreveu:


The first 4 of these apply without error, but I can't get 0005 to 
apply.

This is what I use:


Erik, I generate a new patch set with patience diff algorithm. It
seems it applies cleanly.



It did apply cleanly, thanks.

But I can't get it to correctly do the partial replication in the 
attached pgbench-script (similar versions of which script I also used 
for earlier versions of the patch, last year).


There are complaints in the log (both pub and sub) like:
ERROR:  trying to store a heap tuple into wrong type of slot

I have no idea what causes that.

I attach a zip:

$ unzip -l logrep_rowfilter.zip
Archive:  logrep_rowfilter.zip
  Length  DateTimeName
-  -- -   
17942  2019-09-03 00:47   logfile.6525
10412  2019-09-03 00:47   logfile.6526
 6913  2019-09-03 00:47   logrep_rowfilter_2_nodes.sh
 3371  2019-09-03 00:47   output.txt
- ---
38638 4 files

That bash script runs 2 instances (as compiled on my local setup so it 
will not run as-is) and tries for one minute to get a slice of the 
pgbench_accounts table replicated.  One minute is short but I wanted 
short logfiles; I have tried the same up to 20 minutes without the 
replication completing.  I'll try even longer but in the meantime I hope 
you can figure out why these errors occur.



thanks,


Erik Rijkers


<>


Re: XPRS

2019-09-02 Thread Thomas Munro
On Tue, Sep 3, 2019 at 5:20 AM Tomas Vondra
 wrote:
> FWIW it's not clear to me why the cost would need to be recomputed after
> constructing the parallel version of the plan? My understanding is that
> the idea is to do cost-based planning for the serial plan, and then just
> "mechanically" construct a parallel plan. Although, maybe there could be
> multiple parallel alternatives ...

Presumably you still need to choose between the serial and parallel
plans by comparing costs.  You lose some by adding exchange operators,
but you win some by dividing cardinality estimates.

> >This is sort of fudging together of ideas from conversations with
> >Kevin Grittner (who talked about admission control a few years back),
> >Peter Geoghegan (who mentioned opportunistically asking for more), and
> >things I've heard of on SQL Server ("memory grants").  I think it
> >would provide some relief from the problems we see today: it's hard to
> >set work_mem so that you never get OOM but you can still use a decent
> >amount of your precious memory, especially with mixed parallel and
> >non-parallel query workloads thanks to our current
> >work_mem-multiplying design.
>
> I think this is probably the simplest and most realistic first step.
>
> Whenever I was thinking about memory acquisition, I've assumed we'd
> monitor how much memory the plan is expected to use while we're
> constructing it. My main problem was what to do when we reach the
> per-query limit - whether to (a) simply reject the plan, (b) go back and
> see if we can replan with lower work_mem (but how much and for which
> nodes?), or (c) just continue.

Yeah, it's all quite tricky and circular.  But I'm pretty sure that we
need caps at execution time, anyway, so I think it's OK to start at
that end of the problem and then later try to improve the way the
planner.

> The proposed plan deals with this by not limiting the per-query (or rather
> per-session) budget directly, and instead requesting requesting additional
> budget. Which is nice.
>
> I suspect we should also keep an additional plan that is expected to meet
> the session_work_mem limit, aside from the regular cheapest plan, and use
> it if it's not much worse. Imagine you have a plan with cost 1000 that
> needs (global_work_mem/2 + 1kB) memory, essentially serializing executions
> of this query. And then there's an alternative plan with cost 1100 that
> can run with session_work_mem. It seems better to just accept the second
> plan, because it won't need to wait.

Hmm.  I wonder if it's worth it.  You could also just replan as you
said, but I'm wondering if just rejecting the query would be OK.

> Another challenge with work_mem is that anyone can modify it arbitrarily,
> i.e. a user can do
>
>   SET work_mem = '1TB';
>
> and use as much memory as they wist, or even crash the system. I wonder if
> we could define the new GUCs (session_work_mem and global_work_mem) in a
> way to prevent this. We probably don't want to make them PGC_POSTMASTER
> (it seems useful to allow overriding them in ALTER USER/DATABASE), but I
> don't think we have a good way to do that at the moment. Any ideas in this
> direction?

How about something giving the superuser the following GUCs:

global_work_mem = 16GB
session_min_work_mem = 0.5%  -- the amount of quota sessions keep, for
fast small queries
session_max_work_mem = 20% -- the maximum quota any one session is allowed
session_extra_work_mem = 5% -- opportunistic execution-time boost

Users are free to plan queries with work_mem = 1TB, and if you do that
and it estimates that it wants 512GB, it will be rejected if you try
to execute it because it exceeds session_max_work_mem, with a hint
telling you to turn down work_mem.  Otherwise it either runs or joins
the queue if it can't get the quota it needs immediately.

Eventually we could try to figure out how to set work_mem to automatic
(I don't want to propose a concrete rule, but maybe something based on
session_max_work_mem / njoins, with various fudge factors, and some
accounting for parallel workers; it's probably good to low-ball it and
rely on session_extra_work_mem).

Yeah, I think you'd want to be able to set session_XXX on databases
and roles so that you can say your regular users can't eat more than
10% of memory each, but a big reporting thing is allowed more.

-- 
Thomas Munro
https://enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-09-02 Thread Alvaro Herrera
In the interest of moving things forward, how far are we from making
0001 committable?  If I understand correctly, the rest of this patchset
depends on https://commitfest.postgresql.org/24/944/ which seems to be
moving at a glacial pace (or, actually, slower, because glaciers do
move, which cannot be said of that other patch.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] WAL logging problem in 9.4.3?

2019-09-02 Thread Noah Misch
On Mon, Sep 02, 2019 at 05:15:00PM -0400, Alvaro Herrera wrote:
> I have updated this patch's status to "needs review", since v20 has not
> received any comments yet.
> 
> Noah, you're listed as committer for this patch.  Are you still on the
> hook for getting it done during the v13 timeframe?

Yes, assuming "getting it done" = "getting the CF entry to state other than
Needs Review".




Re: Converting NOT IN to anti-joins during planning

2019-09-02 Thread Alvaro Herrera
David, will we hear from you on this patch during this month?
It sounds from Antonin's review that it needs a few changes.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] logical decoding of two-phase transactions

2019-09-02 Thread Alvaro Herrera
I don't understand why this patch record has been kept aliv for so long,
since no new version has been sent in ages.  If this patch is really
waiting on the author, let's see the author do something.  If no voice
is heard very soon, I'll close this patch as RwF.

If others want to see this feature in PostgreSQL, they are welcome to
contribute.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] WAL logging problem in 9.4.3?

2019-09-02 Thread Alvaro Herrera
I have updated this patch's status to "needs review", since v20 has not
received any comments yet.

Noah, you're listed as committer for this patch.  Are you still on the
hook for getting it done during the v13 timeframe?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

2019-09-02 Thread Tom Lane
Andres Freund  writes:
> I agree that it ought to be more efficent - but also about as equally
> safe? I.e. if the previous code wasn't safe, the new code wouldn't be
> safe either? As in, we're "just" avoiding the assert, but not increasing
> safety?

Well, the point is that the old code risks performing a relcache load
without holding any lock on the relation.  In practice, given that
we do hold a lock on the parent table, it's probably safe ... but
it's at best bad practice.  It's not too hard to imagine future
optimizations that would allow this to result in a corrupt relcache entry.

I don't believe that there's any equivalent risk in the modified code.

regards, tom lane




Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

2019-09-02 Thread Andres Freund
Hi,

On 2019-09-01 16:50:00 -0400, Tom Lane wrote:
> As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty
> duff anyway, because it doesn't bother to check for the defined failure
> return for RelationIdGetRelation.  But if we're concerned about the
> cost of recalculating this stuff per-row, couldn't we cache it a little
> better?  It should be safe to assume the set of index columns isn't
> changing intra-query.

I agree that it ought to be more efficent - but also about as equally
safe? I.e. if the previous code wasn't safe, the new code wouldn't be
safe either? As in, we're "just" avoiding the assert, but not increasing
safety?


> ... in fact, isn't all the infrastructure for that present already?
> Why is this code looking directly at the index at all, rather than
> using the relcache's rd_idattr bitmap?

No idea, that's too long ago :(


> I'm a bit suspicious of the exclusion for idattrs being empty, but
> if I remove that, some of the contrib/test_decoding test results
> change.  Anybody want to comment on that?  If that's actually an
> expected situation, why is there an elog(DEBUG) in that path?

I think Amit's explanation here is probably accurate.

Greetings,

Andres Freund




Re: Patch to add hook to copydir()

2019-09-02 Thread Swen Kooij
Thank you both for the feedback. To give some background
on what I am trying to do. Mostly experimental work right now.

I've built an extension that takes advantage of the copy-on-write
properties of ZFS and BTRFS. A tool transforms a data directory
into a set of datasets/sub volumes. When a new database is
created and the directories are copied, the extension uses the
filesystem's ability to create a snapshot/clone of the source dataset/sub
volume.

As createdb() iterates over the template database's tablespaces
and copies the directories to the new one, the copydir() gets invoked
(from dbcommands.c). My extension hooks copydir() and instead
snapshots the source dir and clones it into the target dir (respecting the
recurse flag).

If the source dir is not a ZFS data set or BTRFS sub volume, the
standard implementation gets invoked and everything is business
as usual.

As described in the first paragraph, I've built a small tool that transforms
all base/* directories in the data directory into ZFS datasets or BTRFS
sub volumes. New tablespaces that get created later would have to be
ZFS datasets or BTRFS sub volumes as well in order for the extension
to work. As explained above, if they are not, the extension will just copy
the files.

I also hook ProcessUtility in order to clean up the ZFS datasets and
BTRFS subvolumes _after_ a database gets deleted. Postgres just
handles the deletion of the files and the extension cleans up any
dangling ZFS/BTRFS resources.

Is there anything that I am missing? My early experiments have been
very promising but my experience with Postgres internals is limited. Any
help or feedback would be much appreciated.

On Mon, Sep 2, 2019 at 11:06 PM Alvaro Herrera  wrote:
>
> On 2019-Sep-02, Peter Eisentraut wrote:
>
> > On 2019-09-02 20:54, Swen Kooij wrote:
> > > I've been working on an extension that tightly integrates
> > > postgres with underlying filesystem . I need to customize
> > > how postgres copies directories for new databases.
> >
> > Could you share some more details, so we can assess whether that is a
> > sensible way to go about it, and what other hooks might be needed?
>
> It seems either terribly high-level, or terribly low-level, depending on
> how you look at it.  I wonder to what extent it conflicts with the table
> AM work, and with the idea of allowing FDWs to have real relfilenodes.
> I wonder if this is just one missing piece that's needed to complete
> something of a layer for which other pieces are already satisfied by
> other hooks.
>
> As is and pending further explanation, it seems a bad idea to me.  If we
> want pluggability here, maybe we need some new abstraction layer.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Patch to add hook to copydir()

2019-09-02 Thread Alvaro Herrera
On 2019-Sep-02, Peter Eisentraut wrote:

> On 2019-09-02 20:54, Swen Kooij wrote:
> > I've been working on an extension that tightly integrates
> > postgres with underlying filesystem . I need to customize
> > how postgres copies directories for new databases.
> 
> Could you share some more details, so we can assess whether that is a
> sensible way to go about it, and what other hooks might be needed?

It seems either terribly high-level, or terribly low-level, depending on
how you look at it.  I wonder to what extent it conflicts with the table
AM work, and with the idea of allowing FDWs to have real relfilenodes.
I wonder if this is just one missing piece that's needed to complete
something of a layer for which other pieces are already satisfied by
other hooks.

As is and pending further explanation, it seems a bad idea to me.  If we
want pluggability here, maybe we need some new abstraction layer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG 12 draft release notes

2019-09-02 Thread Jonathan S. Katz
On 5/12/19 11:42 PM, Bruce Momjian wrote:
> On Sun, May 12, 2019 at 10:49:07AM -0400, Jonathan Katz wrote:
>> Hi Bruce,
>>
>> On 5/11/19 4:33 PM, Bruce Momjian wrote:
>>> I have posted a draft copy of the PG 12 release notes here:
>>>
>>> http://momjian.us/pgsql_docs/release-12.html
>>>
>>> They are committed to git.  It includes links to the main docs, where
>>> appropriate.  Our official developer docs will rebuild in a few hours.
>>
>> Thank you for working on this, I know it's a gargantuan task.
>>
>> I have a small modification for a section entitled "Source Code" which
>> is a repeat of the previous section. Based on the bullet points in that
>> part, I thought "Documentation" might be a more appropriate name; please
>> see attached.
> 
> Yes, I saw that myself and just updated it.  Thanks.

Great!

There is a placeholder for PG_COLORS:

  
   This is enabled with by setting environment variable
   PG_COLORS.  EXAMPLE?
  

I've attached the following that provides an example of how to use this
environmental variable.

Thanks!

Jonathan
diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml
index 8b668ade0f..e05aacaf3c 100644
--- a/doc/src/sgml/release-12.sgml
+++ b/doc/src/sgml/release-12.sgml
@@ -2479,7 +2479,9 @@ Author: Peter Eisentraut 
 
   
This is enabled with by setting environment variable
-   PG_COLORS.  EXAMPLE?
+   PG_COLORS using ANSI escape codes for colors. For 
example,
+   the default setting is
+   PG_COLORS="error=01;31:warning=01;35:locus=01"
   
  
 


signature.asc
Description: OpenPGP digital signature


Re: Proposal for Signal Detection Refactoring

2019-09-02 Thread Alvaro Herrera
On 2019-Mar-06, Chris Travers wrote:

> Here's a new patch.  No rush on it.  I am moving it to next commitfest
> anyway because as code documentation I think this is a low priority late in
> the release cycle.

Chris, are you submitting an updated patch?  There's some unaddressed
feedback from Tom, which looks like it wouldn't take much time to
address.  There are also some comments from Craig, but they're not
directly related as he's suggesting to fix code that does not follow
what you're documenting; so while it'd be a good idea to fix it, it
doesn't get in the way of your patch.  I won't stop you if you want to
post patches for that, though.

Either way, please keep things moving.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Patch to add hook to copydir()

2019-09-02 Thread Swen Kooij
I just realized I completely borked the patch file.

My apologies. Attached a (hopefully) correct patch file.

---
Swen Kooij

On Mon, Sep 2, 2019 at 9:54 PM Swen Kooij  wrote:
>
> Hello all,
>
> I've been working on an extension that tightly integrates
> postgres with underlying filesystem . I need to customize
> how postgres copies directories for new databases.
>
> I first looked at the ProcessUtility_hook. This would require
> me to copy or rewrite most of the createdb() function. This is
> less than ideal of course. Someone on the IRC channel
> suggested I could add a hook for copydir().
>
> I implemented the hook similar to how the
> ProcessUtility_hook is implemented. I couldn't find any tests
> for any of the existing hooks. I've been looking at the regression
> tests, but I am not entirely sure how to proceed on that front.
>
> I tested my patch extensively against master and
> the REL_12_STABLE branch. All tests pass and the patch has
> been working great with my extension.
>
> I attached a first draft of the patch against master.
>
> ---
> Swen Kooij
From 7033aed34e3b35859c470a738a5efbe48f101418 Mon Sep 17 00:00:00 2001
From: Swen Kooij 
Date: Sun, 1 Sep 2019 22:42:04 +0300
Subject: [PATCH] Add hook for copydir()

Plugins/extensions that want to customize how
PostgreSQL interact with the filesystem can hook
in copydir() instead of having to hook in on
a higher level and potentially copy large amounts
of code.

One example use case is wanting to customize how
directories for new databases are created. Without
this hook, one would have to use ProcessUtility_hook.
This would require copying the vast majority of createdb().
---
 src/backend/storage/file/copydir.c | 19 +++
 src/include/storage/copydir.h  |  5 +
 2 files changed, 24 insertions(+)

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index ca1c9cd765..e44bebf947 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,6 +27,9 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 
+/* Hook for plugins to get control in copydir() */
+copydir_hook_type copydir_hook = NULL;
+
 /*
  * copydir: copy a directory
  *
@@ -36,6 +39,22 @@
 void
 copydir(char *fromdir, char *todir, bool recurse)
 {
+	if (copydir_hook)
+		(*copydir_hook) (fromdir, todir, recurse);
+	else
+		standard_copydir(fromdir, todir, recurse);
+}
+
+/*
+ * copydir: copy a directory
+ *
+ * If recurse is false, subdirectories are ignored.  Anything that's not
+ * a directory or a regular file is ignored.
+ */
+void
+standard_copydir(char *fromdir, char *todir, bool recurse)
+{
+
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		fromfile[MAXPGPATH * 2];
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index 525cc6203e..8b73184177 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,7 +13,12 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+/* Hook for plugins to get control in copydir() */
+typedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse);
+extern PGDLLIMPORT copydir_hook_type copydir_hook;
+
 extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void standard_copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
 
 #endif			/* COPYDIR_H */
-- 
2.20.1



Re: Patch to add hook to copydir()

2019-09-02 Thread Peter Eisentraut
On 2019-09-02 20:54, Swen Kooij wrote:
> I've been working on an extension that tightly integrates
> postgres with underlying filesystem . I need to customize
> how postgres copies directories for new databases.

Could you share some more details, so we can assess whether that is a
sensible way to go about it, and what other hooks might be needed?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Patch to add hook to copydir()

2019-09-02 Thread Swen Kooij
Hello all,

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

I first looked at the ProcessUtility_hook. This would require
me to copy or rewrite most of the createdb() function. This is
less than ideal of course. Someone on the IRC channel
suggested I could add a hook for copydir().

I implemented the hook similar to how the
ProcessUtility_hook is implemented. I couldn't find any tests
for any of the existing hooks. I've been looking at the regression
tests, but I am not entirely sure how to proceed on that front.

I tested my patch extensively against master and
the REL_12_STABLE branch. All tests pass and the patch has
been working great with my extension.

I attached a first draft of the patch against master.

---
Swen Kooij
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index ca1c9cd765..e44bebf947 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,6 +27,9 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 
+/* Hook for plugins to get control in copydir() */
+copydir_hook_type copydir_hook = NULL;
+
 /*
  * copydir: copy a directory
  *
@@ -36,6 +39,22 @@
 void
 copydir(char *fromdir, char *todir, bool recurse)
 {
+	if (copydir_hook)
+		(*copydir_hook) (fromdir, todir, recurse);
+	else
+		standard_copydir(fromdir, todir, recurse);
+}
+
+/*
+ * copydir: copy a directory
+ *
+ * If recurse is false, subdirectories are ignored.  Anything that's not
+ * a directory or a regular file is ignored.
+ */
+void
+standard_copydir(char *fromdir, char *todir, bool recurse)
+{
+
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		fromfile[MAXPGPATH * 2];
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index 525cc6203e..8b73184177 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,7 +13,12 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+/* Hook for plugins to get control in copydir() */
+typedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse);
+extern PGDLLIMPORT copydir_hook_type copydir_hook;
+
 extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void standard_copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
 
 #endif			/* COPYDIR_H */


Re: Patch to document base64 encoding

2019-09-02 Thread Karl O. Pinc
Hi Alvaro,

On Mon, 2 Sep 2019 13:56:28 -0400
Alvaro Herrera  wrote:

> Are you submitting an updated version soon?

I don't expect to be able to make a new patch for at
least another week.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Patch to document base64 encoding

2019-09-02 Thread Alvaro Herrera
On 2019-Aug-02, Karl O. Pinc wrote:

> On Fri, 02 Aug 2019 10:44:43 -0400
> Tom Lane  wrote:
> 
> > I don't really have a problem with
> > repeating the entries for other functions that exist in both
> > text and bytea variants, either.
> 
> Ok.  Thanks.  I'll repeat entries then.

Hello Karl,

Are you submitting an updated version soon?

Tom, you're still listed as committer for this patch.  Just a heads up.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_upgrade version checking questions

2019-09-02 Thread Alvaro Herrera
On 2019-Jul-30, Daniel Gustafsson wrote:

> I’ll take a stab at tidying all of this up to require less duplication, we’ll
> see where that ends up.

Hello Daniel, are you submitting a new version soon?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

2019-09-02 Thread Tom Lane
Amit Langote  writes:
> On Mon, Sep 2, 2019 at 6:31 AM Tom Lane  wrote:
>> Here's a proposed patch along those lines.  It fixes Hadi's original
>> crash case and passes check-world.

> Agree that this patch would be a better solution for Hadi's report,
> although I also agree that the situation with index locking for DELETE
> isn't perfect.

Thanks for checking!

>> I'm a bit suspicious of the exclusion for idattrs being empty, but
>> if I remove that, some of the contrib/test_decoding test results
>> change.  Anybody want to comment on that?  If that's actually an
>> expected situation, why is there an elog(DEBUG) in that path?

> ISTM that the exclusion case may occur with the table's replica
> identity being REPLICA_IDENTITY_DEFAULT and there being no primary
> index defined, in which case nothing needs to get logged.

Looking more closely, the case is unreachable in the heap_update
path because key_changed will necessarily be false if the idattrs
set is empty.  But it is reachable in heap_delete because that
just passes key_changed = constant true, whether or not there's
any defined replica identity.  In view of that, I think
we should just remove the elog(DEBUG) ... and maybe add a comment
explaining this.

regards, tom lane




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-09-02 Thread Alvaro Herrera
On 2019-Aug-14, David Rowley wrote:

> For now, I'm out of ideas. If anyone else feels like suggesting
> something of picking this up, feel free.

Hmm ... is this patch rejected, or is somebody still trying to get it to
committable state?  David, you're listed as committer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add test case for sslinfo

2019-09-02 Thread Alvaro Herrera
Hao Wu,

Are you submitting an updated version of this patch soon?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG 12 draft release notes

2019-09-02 Thread Jonathan S. Katz
On 9/2/19 1:37 PM, Andrey Borodin wrote:
> 
> 
>> 2 сент. 2019 г., в 22:02, Jonathan S. Katz  написал(а):
>>
>>
>> Attached is a patch proposing items for the major items section. This is
>> working off of the ongoing draft of the press release[1]. Feedback
>> welcome. With respect to the linking, I tried I to give a bunch of
>> jumping off points for users to explore the features, but visually tried
>> to ensure the readability was consistent.
> 
> +
> + Reduction of WAL overhead of
> + GiST, GIN, 
> and
> + SP-GiST indexes and added support
> + for covering indexes via the 
> + INCLUDE clause for
> + SP-GiST indexes
> +
> 
> Maybe I'm missing something, but covering indexes are supported in GiST, not 
> in SP-GiST.

Nope, you're absolutely correct: that was a typo as a result of
copying/pasting formatting. Attached is a revision that correctly
specifies covering indexes for GiST.

Thanks!

Jonathan
diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml
index 8b668ade0f..62ab9fb21f 100644
--- a/doc/src/sgml/release-12.sgml
+++ b/doc/src/sgml/release-12.sgml
@@ -21,7 +21,125 @@

 
 
- big item
+ 
+  General performance improvements, including:
+  
+   
+
+ Optimizations to space utilization and read/write performance for
+ B-tree indexes that are frequently modified
+
+   
+   
+
+ Partitioning enhancements that include query performance improvements
+ on tables with thousands of partitions, improved performance with
+  and , and the
+ ability to execute 
+ ATTACH PARTITION without blocking queries
+
+   
+   
+
+ Common table expressions
+ (CTE) can now be automatically inlined
+
+   
+   
+
+ Reduction of WAL overhead of
+ GiST, GIN, and
+ SP-GiST indexes and added support
+ for covering indexes via the 
+ INCLUDE clause for
+ GiST indexes
+
+   
+   
+
+  now supports most-common value
+ (MCV) statistics to support improved query plans for columns that are
+ nonuniformly distributed
+
+   
+  
+ 
+
+
+
+ 
+  Enhancements to administrative functionality, including:
+  
+   
+
+ Rebuild an index without blocking writes to a table with the
+ 
+ REINDEX CONCURRENTLY command
+
+   
+   
+
+  can enable/disable page checksums,
+ used for detecting data corruption, in an offline cluster
+
+
+ Progress reporting statistics for ,
+ , ,
+ VACUUM FULL, and
+ 
+
+   
+  
+ 
+
+
+
+ 
+   Support for the SQL/JSON
+   path language
+ 
+
+
+
+ 
+   Stored generated columns
+ 
+
+
+
+ 
+   Nondeterministic ICU
+   collations that enable
+   case-insensitive and accent-insensitive ordering
+ 
+
+
+
+ 
+  New authentication features, including:
+  
+   
+
+ Client and server-side encryption for authentication over the
+ GSSAPI
+
+   
+   
+
+ Discovery of LDAP servers if PostgreSQL is
+ built with OpenLDAP
+
+   
+   
+
+ Support multifactor authentication with the 
+ clientcert=verify-full option when combined
+ with an additional authentication method in
+ pg_hba.conf
+
+   
+  
+ 
 
 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] kNN for btree

2019-09-02 Thread Alvaro Herrera
On 2019-Jul-12, Nikita Glukhov wrote:

> Attached 13th version of the patches.

Hello Nikita,

Can you please rebase this again?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_waldump and PREPARE

2019-09-02 Thread Alvaro Herrera
On 2019-Aug-01, Michael Paquier wrote:

> I may be missing something of course, but in this case we argued about
> adding a couple of more fields.  In consequence, the patch should be
> waiting on its author, no?

Fujii,

Are you in a position to submit an updated version of this patch?

Maybe Vignesh is in a position to help to complete this, since he has
been eyeing this code lately.  Vignesh?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Contribution to Perldoc for TestLib module in Postgres

2019-09-02 Thread Alvaro Herrera
On 2019-Jul-30, Michael Paquier wrote:

> I think that the SYNOPSIS could be shaped better.  As of now it is a
> simple succession of the same commands listed without any link to each
> other, which is contrary for example to what we do in PostgresNode.pm,
> where it shows up a set of small examples which are useful to
> understand how to shape the tests and the possible interactions
> between the routines of the module.  My take would be to keep it
> simple and minimal as TestLib.pm is the lowest level of our TAP test
> infrastructure.

Agreed ... that's pretty much the same thing I tried to say upthread.  I
wrote my own synopsis, partly using your suggestions.  I reworded the
description for the routines (I don't think there's any I didn't touch),
added a mention of $windows_os, added a =head1 to split out the ad-hoc
routines from the Test::More wrappers.

And pushed.

Please give it another look.  It might need more polish.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG 12 draft release notes

2019-09-02 Thread Andrey Borodin



> 2 сент. 2019 г., в 22:02, Jonathan S. Katz  написал(а):
> 
> 
> Attached is a patch proposing items for the major items section. This is
> working off of the ongoing draft of the press release[1]. Feedback
> welcome. With respect to the linking, I tried I to give a bunch of
> jumping off points for users to explore the features, but visually tried
> to ensure the readability was consistent.

+
+ Reduction of WAL overhead of
+ GiST, GIN, and
+ SP-GiST indexes and added support
+ for covering indexes via the 
+ INCLUDE clause for
+ SP-GiST indexes
+

Maybe I'm missing something, but covering indexes are supported in GiST, not in 
SP-GiST.

Best regards, Andrey Borodin.



Re: XPRS

2019-09-02 Thread Tomas Vondra

On Mon, Sep 02, 2019 at 02:19:15PM +1200, Thomas Munro wrote:

On Sat, Aug 24, 2019 at 3:19 AM Tomas Vondra
 wrote:

> Although “The Wei Hong Optimizer” was designed in the context of
> Postgres, it became the standard approach for many of the parallel
> query optimizers in industry."

I assume this quote is from 30 years ago. I wonder if the claim is still
true, on current hardware (including e.g. distributed databases).


The quote is from 2018, and appears in the article I linked (it's a
chapter from the book Making Databases Work: The Wisdom of Michael
Stonebraker), but I'm not sure which systems it's referring to.



Hmm, that's unfortunate - it'd be quite interesting to know which
databases it's referring to. I suspect no optimizer is ideal in this
regard, i.e. each database has some "gaps" where some nodes don't have a
straightforward parallel version.


Speculation:  Many other systems have what we call parallel-oblivious
operators only, and then insert various exchange operators to make a
parallel plan.  That is, they don't necessarily have a direct
equivalent of our "Parallel Sequential Scan", "Parallel Index Scan",
"Parallel Foreign Scan": they just use their regular scans, possibly
with the addition of some kind of "Parallel Scatter" node (that's my
made up name, it's the opposite of Gather, called various things like
"page supplier" or "block iterator") or "Parallel Repartition"
inserted in the right places.  Perhaps they create a serial plan
first, and then try to parallelise it by inserting various parallel
operators and then recomputing the costs?  Rather than considering the
separate paths in the first phase of the optimiser, as we do.  The
cases where Hong's best-parallel-plan hypothesis isn't true for us now
might go away if we had Parallel Repartition, so that each 'stream'
would be the complete set of tuples for some known partition.



I don't know. It kinda reminds me planning with distributed databases,
which also need exchange data between nodes in various cases - say, when
joining two relations distributed in different ways. The redistribution is
however pretty costly (network I/O, bandwidth etc.) to the extent that
it's often much better to pick a very different join to reduce the amount
of data to exchange, or eliminate the redistribution altogether. For
parallelism the costs are much lower, of course, but I don't think we can
just ignore those.

FWIW it's not clear to me why the cost would need to be recomputed after
constructing the parallel version of the plan? My understanding is that
the idea is to do cost-based planning for the serial plan, and then just
"mechanically" construct a parallel plan. Although, maybe there could be
multiple parallel alternatives ...


To be clear, I'm not suggesting we do that necessarily, just pointing
out some interesting claims about ancient POSTGRES wisdom, in a highly
speculative water cooler thread.   Actually, this isn't the first time
it's occurred to me that elements of our design were falling out of
the order that we chose to implement things in.  Another example is
the "Shared Hash" that I had in an early version of my work on
Parallel Hash Join, where just one process would run a parallel-safe
but non-parallel-oblivious plan to build a shared hash table while
other workers twiddled their thumbs; I dropped it because our cost
model has no penalty for running N copies of the same plan rather than
just one so there was no way to pick that plan, and that's because we
don't have a cost model like Hong's that considers resource usage too.
Another more speculative observation: maybe no-partition/shared
Parallel Hash Join is only obvious if you already have the general
concept of parallel-aware executor nodes.  AFAIK Robert and Amit
invented those to be able to coordinate parallel scans between
processes, where thread-based systems might be able to share a single
scan state somehow under a Scatter-like operator.  If you had threads,
you might not need that concept that allows arbitrary executor nodes
to communicate with each other between workers, and then it might be
more obvious and natural to use repartitioning for parallelising hash
joins.


FWIW I think we'll have to do something about resource acquisition, sooner
or later. It was always quite annoying that we don't really consider
memory consumption of the query as a whole during planning, and parallel
query made it a bit more painful.


Agreed.

Here's an approach I have been wondering about to cap total executor
memory usage, which is a much more down-to-Earth idea than any of the
above space cadet planner problems.  Let's start at the other end of
the problem, by introducing admission control and memory quotas.  That
is, keep using work_mem with its current per-node-copy meaning at
planning time, for now, and then:

1.  Compute the peak amount of memory each plan thinks it will need.
Initially that could be done by by summing estimates from all nodes
and considering workers.  A lat

Re: PG 12 draft release notes

2019-09-02 Thread Jonathan S. Katz
On 5/11/19 4:33 PM, Bruce Momjian wrote:
> I have posted a draft copy of the PG 12 release notes here:
> 
>   http://momjian.us/pgsql_docs/release-12.html
> 
> They are committed to git.  It includes links to the main docs, where
> appropriate.  Our official developer docs will rebuild in a few hours.
> 

Thanks again for compiling and writing up the release notes. I know it
is no small effort.

Attached is a patch proposing items for the major items section. This is
working off of the ongoing draft of the press release[1]. Feedback
welcome. With respect to the linking, I tried I to give a bunch of
jumping off points for users to explore the features, but visually tried
to ensure the readability was consistent.

I also attached a patch addressing the "MENTION ITS AFFECT ON ORDERING?"
in which I choose to answer it "yes" and added a comment :)

Thanks,

Jonathan

[1]
https://www.postgresql.org/message-id/c56eeb88-4a8c-2c6c-b5f1-9d46872c247c%40postgresql.org
diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml
index 8b668ade0f..e5929f887c 100644
--- a/doc/src/sgml/release-12.sgml
+++ b/doc/src/sgml/release-12.sgml
@@ -21,7 +21,125 @@

 
 
- big item
+ 
+  General performance improvements, including:
+  
+   
+
+ Optimizations to space utilization and read/write performance for
+ B-tree indexes that are frequently modified
+
+   
+   
+
+ Partitioning enhancements that include query performance improvements
+ on tables with thousands of partitions, improved performance with
+  and , and the
+ ability to execute 
+ ATTACH PARTITION without blocking queries
+
+   
+   
+
+ Common table expressions
+ (CTE) can now be automatically inlined
+
+   
+   
+
+ Reduction of WAL overhead of
+ GiST, GIN, and
+ SP-GiST indexes and added support
+ for covering indexes via the 
+ INCLUDE clause for
+ SP-GiST indexes
+
+   
+   
+
+  now supports most-common value
+ (MCV) statistics to support improved query plans for columns that are
+ nonuniformly distributed
+
+   
+  
+ 
+
+
+
+ 
+  Enhancements to administrative functionality, including:
+  
+   
+
+ Rebuild an index without blocking writes to a table with the
+ 
+ REINDEX CONCURRENTLY command
+
+   
+   
+
+  can enable/disable page checksums,
+ used for detecting data corruption, in an offline cluster
+
+
+ Progress reporting statistics for ,
+ , ,
+ VACUUM FULL, and
+ 
+
+   
+  
+ 
+
+
+
+ 
+   Support for the SQL/JSON
+   path language
+ 
+
+
+
+ 
+   Stored generated columns
+ 
+
+
+
+ 
+   Nondeterministic ICU
+   collations that enable
+   case-insensitive and accent-insensitive ordering
+ 
+
+
+
+ 
+  New authentication features, including:
+  
+   
+
+ Client and server-side encryption for authentication over the
+ GSSAPI
+
+   
+   
+
+ Discovery of LDAP servers if PostgreSQL is
+ built with OpenLDAP
+
+   
+   
+
+ Support multifactor authentication with the 
+ clientcert=verify-full option when combined
+ with an additional authentication method in
+ pg_hba.conf
+
+   
+  
+ 
 
 

diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml
index 8b668ade0f..20ca5eb569 100644
--- a/doc/src/sgml/release-12.sgml
+++ b/doc/src/sgml/release-12.sgml
@@ -1941,9 +1941,9 @@ Author: Peter Eisentraut 
   
 
   
-   This allows case and accent-agnostic equality comparisons using
-   "nondeterministic" collations.  This is only supported for
-   ICU collations.  MENTION ITS AFFECT ON ORDERING?
+   This allows case and accent-agnostic equality comparisons, such as 
during
+   an ordering operation, using "nondeterministic" collations.  This is 
only
+   supported for ICU collations.
   
  
 


signature.asc
Description: OpenPGP digital signature


Re: safe to overload objectSubId for a type?

2019-09-02 Thread Chapman Flack
On 09/02/19 11:41, Tom Lane wrote:
> Hm, apparently we already do handle that in some way, because
> this works:
> ...
> Nonetheless, I'd be pretty hesitant to allow somebody to use
> objsubid with some entirely different meaning for types.

As long as it stays an internal detail of a caching scheme used by me
and will be masked back to 0 before anything escapes to elsewhere, so it
really only exploits the current fact that objSubId for types carries
no information at all ... and no one seems to be chiming in with a pressing
case for changing that ... I think I can get away with the simple scheme,
admitting the possibility of having to revisit it down the road.

Regards,
-Chap




Re: [PATCH] Implement uuid_version()

2019-09-02 Thread Alvaro Herrera
On 2019-Jul-13, Jose Luis Tallon wrote:

> Considering the later arguments on-list, I plan on submitting a more
> elaborate patchset integrating the feedback received so far, and along the
> following lines:
> 
> - uuid type, v4 generation and uuid_version() in core
> 
> - Provide a means to rename/supercede extensions keeping backwards
> compatibility (i.e. uuid-ossp -> uuid, keep old code working)

It is wholly unclear what this commitfest entry is all about; in the
thread there's a mixture about a new uuid_version(), some new v4 stuff
migrating from pgcrypto (which apparently was done), plus some kind of
mechanism to allow upgrading extension names; all this stemming from
feedback from the patch submitted in April.  But there hasn't been a new
patch in a long time, and there won't be a new patch version during the
current commitfest.  Therefore, I'm closing this entry as Returned with
Feedback.  The author(s) can create a new entry in a future commitfest
once they have a new patch.

I do suggest to keep such a patch restricted in scope.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: roll pg_stat_statements into core

2019-09-02 Thread Tom Lane
Euler Taveira  writes:
> At least if pg_stat_statements
> was in core you could load it by default and have a GUC to turn it
> on/off without restarting the server (that was Magnus proposal and
> Andres agreed).

That assertion is 100% bogus.  To turn it on or off on-the-fly,
you'd need some way to acquire or release its shared memory
on-the-fly.

It's probably now possible to do something like that, using the
DSM mechanisms, but the code for it hasn't been written (AFAIK).
And it wouldn't have much to do with whether the module was
in core or stayed where it is.

Another concern that I have about moving pg_stat_statements
into core is that doing so would effectively nail down
Query.queryId as belonging to pg_stat_statements, whereas
currently it's possible for other plugins to commandeer that
if they wish.  This isn't academic, because of the fact that
not everybody is satisfied with the way pg_stat_statements
defines queryId [1].

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/1553029215728-0.post%40n3.nabble.com




Re: safe to overload objectSubId for a type?

2019-09-02 Thread Tom Lane
Chapman Flack  writes:
> On 09/02/19 00:29, Tom Lane wrote:
>> If we ever do make ObjectAddress.objectSubId mean something for types,
>> I'd expect --- based on the precedent of relation columns --- that it'd
>> specify a column number for a column of a composite type.  There are
>> fairly obvious use-cases for that, such as allowing a DROP of a column
>> type to not propagate to the whole composite type.

> Could you give an illustration to make sure I'm following?

Hm, apparently we already do handle that in some way, because
this works:

regression=# create type myenum as enum ('red', 'black');
CREATE TYPE
regression=# create type mycomposite as (f1 int, f2 myenum, f3 text);
CREATE TYPE
regression=# drop type myenum;
ERROR:  cannot drop type myenum because other objects depend on it
DETAIL:  column f2 of composite type mycomposite depends on type myenum
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
regression=# drop type myenum cascade;
NOTICE:  drop cascades to column f2 of composite type mycomposite
DROP TYPE
regression=# \d mycomposite
Composite type "public.mycomposite"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 f1 | integer |   |  | 
 f3 | text|   |  | 

I'm too lazy to go check, but I suspect that the representation
of this situation in pg_depend makes use of a dependency for a
column of the relation associated with the mycomposite type,
so that the entry having nonzero objsubid has classid pg_class
not classid pg_type.

Nonetheless, I'd be pretty hesitant to allow somebody to use
objsubid with some entirely different meaning for types.  That
seems certain to lead to confusion and bugs.

regards, tom lane




Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2019-09-02 Thread Etsuro Fujita
On Wed, Aug 14, 2019 at 11:51 AM Etsuro Fujita  wrote:
> This is my TODO item for PG13, but I'll give priority to other things
> in the next commitfest.  If anyone wants to work on it, feel free;
> else I'll move this to the November commitfest when it opens.

Moved.

Best regards,
Etsuro Fujita




Re: add a MAC check for TRUNCATE

2019-09-02 Thread Kohei KaiGai
Hello Yuli,

2019年7月25日(木) 3:52 Yuli Khodorkovskiy :
> Since all DAC checks should have corresponding MAC, this patch adds a
> hook to allow extensions to implement a MAC check on TRUNCATE. I have
> also implemented this access check in the sepgsql extension.
>
> One important thing to note is that refpolicy [1] and Redhat based
> distributions do not have the SELinux permission for db_table {truncate}
> implemented.
>
How db_table:{delete} permission is different from truncate?
>From the standpoint of data access, TRUNCATE is equivalent to DELETE
without WHERE, isn't it?
Of course, there are some differences between them. TRUNCATE takes
exclusive locks and eliminates underlying data blocks, on the other hands,
DELETE removes rows under MVCC manner. However, both of them
eventually removes data from the target table.

I like to recommend to reuse "db_table:{delete}" permission for TRUNCATE.
How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: pg_dump --exclude-* options documentation

2019-09-02 Thread Tom Lane
Fabien COELHO  writes:
>> Should we change the documentation of the old pg_dump options to also
>> take a "pattern", or change the documentation of the new pg_dumpall
>> option to read "database"?

> My 0.02€: we should use the more general and precise, i.e. pattern.

+1 ... it doesn't seem to me that "--exclude-schema=schema" conveys
anything useful.

regards, tom lane




Re: safe to overload objectSubId for a type?

2019-09-02 Thread Chapman Flack
On 09/02/19 00:29, Tom Lane wrote:

> If this is totally independent of ObjectAddress, why are you even
> asking?  I assume that what you mean is you'd like these values to
> bleed into ObjectAddresses or vice versa.

Only that I'm working on a data structure of my own to cache my own
representations for objects, which I can find by a simple hash lookup
on a key of three ints, given an ObjectAddress. But I'd like to have
distinct instances representing distinct typmods of a type, and if I
had a method to look up a type instance given an ObjectAddress and
typmod, and that method could simply use the typmod (or a trivially
transformed version, -1 <-> 0) as the third int of the key, then my
single hash structure and one lookup serves for everything.

Otherwise, I essentially need a second whole layer of the same machinery
to track typmod-variants of types (and forget them when unreferenced,
invalidated, yada yada), or variable-length cache keys so type keys
are four ints instead of three, or four-int keys for everything, just
because types.

Any of those alternatives would just be a SMOP if truly necessary,
but uglier and more complex.


> If we ever do make ObjectAddress.objectSubId mean something for types,
> I'd expect --- based on the precedent of relation columns --- that it'd
> specify a column number for a column of a composite type.  There are
> fairly obvious use-cases for that, such as allowing a DROP of a column
> type to not propagate to the whole composite type.

Could you give an illustration to make sure I'm following? I tried to
create my own example with:

CREATE TYPE foo AS (bar int, baz int);
CREATE TABLE quux (a foo);
INSERT INTO quux (a.bar, a.baz) VALUES (12, 34);
SELECT * FROM quux;
a
-
 (12,34)
ALTER TYPE foo DROP ATTRIBUTE baz;
SELECT * FROM quux;
  a
--
 (12)

but I guess that must not exercise the case you had in mind.


I could say my main reason for asking was to judge the imminence of
any change. If there's a vague conceivable possibility of objectSubId
meaning something for types in the indefinite future, I might do my
simple implementation now, and some uglier SMOP alternative later
when/if needed. If somebody has such a change in mind for the near
term, that'd be different.

Regards,
-Chap




Re: Commit fest 2019-09

2019-09-02 Thread Alvaro Herrera
On 2019-Sep-01, Michael Paquier wrote:

> I am not sure if somebody would like to volunteer, but I propose
> myself as commit fest manager for the next session.  Feel free to
> reply to this thread if you feel that you could help out as manager
> for this time.

Hello,

Thanks Michael for handling these tasks over the weekend.

I had offered myself as CFM for this one when the cycle started, so I'm
happy to do it.

At this time, the commitfest starts with:

 status │ count 
┼───
 Needs review   │   165
 Waiting on Author  │30
 Ready for Committer│11
 Returned with Feedback │ 1
 Moved to next CF   │ 2
 Committed  │14
 Rejected   │ 1
 Withdrawn  │ 4

There's an embarrasingly large number of patches in the "Bug Fixes"
section.  I encourage reviewers to look at those that "Need Review" so
that they can set "Ready for Committer" and get a committer to finish
them up:
   status│ count 
─┼───
 Committed   │ 2
 Needs review│20
 Ready for Committer │ 2
 Waiting on Author   │ 4
 Withdrawn   │ 1

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: roll pg_stat_statements into core

2019-09-02 Thread Euler Taveira
Em seg, 2 de set de 2019 às 05:11, Adrien Nayrat
 escreveu:
>
> On 9/1/19 8:54 PM, Tom Lane wrote:
> >> - The overhead for most use cases is low compared to the benefit.
> > Please do not expect that we're going to accept such assertions
> > unsupported by evidence.  (As a very quick-n-dirty test, I tried
> > "pgbench -S" and got somewhere around 4% TPS degradation with
> > pg_stat_statements loaded.  That doesn't seem negligible.)
>
> AFAIR Andres pointed overhead could be much more when you have more queries 
> than
> pg_stat_statements.max [1]. Eviction can be costly.
>
pg_stat_statements is important in a lot of query analysis. If you
make a comparison between pg_stat_statements and pgbadger, you can
capture queries for the latter changing log_min_duration_statement or
log_statement without restart the server, however, the former you
can't without restarting the server. At least if pg_stat_statements
was in core you could load it by default and have a GUC to turn it
on/off without restarting the server (that was Magnus proposal and
Andres agreed). I support this idea. In v10 we changed some GUCs to
perform replication out-of-box; we should do the same for query
analysis.

Regards,


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: Index Skip Scan

2019-09-02 Thread Dmitry Dolgov
> On Wed, Aug 28, 2019 at 9:32 PM Floris Van Nee  
> wrote:
>
> I'm afraid I did manage to find another incorrect query result though

Yes, it's an example of what I was mentioning before, that the current modified
implementation of `_bt_readpage`  wouldn't work well in case of going between
pages. So far it seems that the only problem we can have is when previous and
next items located on a different pages. I've checked how this issue can be
avoided, I hope I will post a new version relatively soon.




Re: SegFault on 9.6.14

2019-09-02 Thread Amit Kapila
On Fri, Aug 9, 2019 at 6:29 PM Robert Haas  wrote:
>
> On Wed, Aug 7, 2019 at 5:45 AM vignesh C  wrote:
> > I have made a patch based on the above lines.
> > I have tested the scenarios which Thomas had shared in the earlier
> > mail and few more tests based on Thomas's tests.
> > I'm not sure if we will be going ahead with this solution or not.
> > Let me know your opinion on the same.
> > If you feel this approach is ok, we can add few of this tests into pg tests.
>
> I think this patch is bizarre:
>
> - It introduces a new function called getParallelModeLevel(), which is
> randomly capitalized different from the other functions that do
> similar things, and then uses it to do the same thing that could have
> been done with the existing function IsInParallelMode().
> - It contains an "if" statement whose only content is an Assert().
> Don't write if (a) Assert(b); write Assert(!a || b).
> - It contains zero lines of comment changes, which is obviously not
> enough for a patch that proposes to fix a very thorny issue.  This
> failure has two parts.  First, it adds no new comments to explain the
> bug being fixed or the theory of operation of the new code. Second, it
> does not even bother updating existing comments that are falsified by
> the patch, such as the function header comments for
> ExecParallelCleanup and ExecShutdownGather.
> - It changes what ExecParallelCleanup does while adjusting only one of
> the two callers to match the behavior change.  nodeGatherMerge.c
> manages to be completed untouched by this patch.  If you change what a
> function does, you really need to grep for all the calls to that
> function and adjust all callers to match the new set of expectations.
>
> It's a little hard to get past all of those issues and look at what
> the patch actually does, but I'm going to try: the theory of operation
> of the patch seems to be that we can skip destroying the parallel
> context when performing ExecParallelCleanup and in fact when exiting
> parallel mode, and then when we get to executor end time the context
> will still be there and we can fish the instrumentation out of it. But
> this seems problematic for several reasons.  For one thing, as Amit
> already observed, the code currently contains an assertion which
> ensure that a ParallelContext can't outlive the time spent in parallel
> mode, and it doesn't seem desirable to relax that assertion (this
> patch removes it).
>
> But beyond that, the issue here is that the Limit node is shutting
> down the Gather node too early, and the right fix must be to stop
> doing that, not to change the definition of what it means to shut down
> a node, as this patch does.  So maybe a possible approach here - which
> I think is more or less what Tom is proposing - is:
>
> 1. Remove the code from ExecLimit() that calls ExecShutdownNode().
>

Attached patch does that.  I have also added one test as a separate
patch so that later if we introduce shutting down resources in Limit
node, we don't break anything.  As of now, I have kept it separate for
easy verification, but if we decide to go with this approach and test
appears fine, we can merge it along with the fix.

> 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
> gets called at the very end of execution, at least when execute_once
> is set, before exiting parallel mode.
>

I am not sure if I completely understand this point.  AFAICS, the
ExecuteShutdownNode is called whenever we are done getting the tuples.
One place where it is not there in that function is when we assume
destination is closed, basically below code:
ExecutePlan()
{
..
if (!dest->receiveSlot(slot, dest))
break;
..
}

Do you expect this case to be also dealt or you have something else in
mind?  The other possibility could be that we move the shutdown of the
node at the end of the function when we exit parallel mode but doing
that lead to some regression failure on my machine.  I will
investigate the same.

> 3. Figure out, possibly at a later time or only in HEAD, how to make
> the early call to ExecLimit() in ExecShutdownNode(), and then put it
> back.

Okay,  Tom has suggested a design to address this, but that will be
for HEAD only.  To be clear, I am not planning to spend time on that
at this moment, but OTOH, I want the bug reported in this thread to be
closed, so for now, we need to proceed with some minimum fix as
mentioned by you in above two points.  If someone else can write a
patch, I can help in the review of same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Forbid-Limit-node-to-shutdown-resources.patch
Description: Binary data


0002-Test-rescans-when-parallel-node-is-beneath-Limit-nod.patch
Description: Binary data


Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Masahiko Sawada
On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier  wrote:
>
> On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > "initializing" phase again and other columns were empty. So, it is
> > > not suitable to fix the problem. :(
> > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > time to do that after PGConf.Asia 2019.
> > > If we selected 3., it affects following these progress reporting
> > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > I suppose. Any comments welcome! :)
> >
> > I looked at this open item. I prefer #3 but I think it would be enough
> > to have a small stack using a fixed length array to track nested
> > progress information and the current executing command (maybe 4 or 8
> > would be enough for maximum nested level for now?). That way, we don't
> > need any change the interfaces. AFAICS there is no case where we call
> > only either pgstat_progress_start_command or
> > pgstat_progress_end_command without each other (although
> > pgstat_progress_update_param is called without start). So I think that
> > having a small stack for tracking multiple reports would work.
> >
> > Attached the draft patch that fixes this issue. Please review it.
>
> Do we actually want to show to the user information about CREATE INDEX
> which is different than CLUSTER?  It could be confusing for the user
> to see a progress report from a command different than the one
> actually launched.

I personally think it would be helpful for users. We provide the
progress reporting for each commands but it might not be detailed
enough. But we can provide more details of progress information of
each commands by combining them. Only users who want to confirm the
details need to see different progress reports.

> There could be a method 4 here: do not start a new
> command progress when there is another one already started, and do not
> try to end it in the code path where it could not be started as it did
> not stack.  So while I see the advantages of stacking the progress
> records as you do when doing cascading calls of the progress
> reporting, I am not sure that:
> 1) We should bloat more PgBackendStatus for that.
> 2) We want to add more complication in this area close to the
> release of 12.

I agreed, especially to 2. We can live with #4 method in PG12 and the
patch I proposed could be discussed as a new feature.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: SIGQUIT on archiver child processes maybe not such a hot idea?

2019-09-02 Thread Kyotaro Horiguchi
At Mon, 2 Sep 2019 15:51:34 +0900, Michael Paquier  wrote 
in <20190902065134.ge1...@paquier.xyz>
> On Mon, Sep 02, 2019 at 12:27:09AM +, Tsunakawa, Takayuki wrote:
> > From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> >> After investigation, the mechanism that's causing that is that the
> >> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
> >> down its replica server with a mode-immediate stop, which causes
> >> that postmaster to shut down all its children with SIGQUIT, and
> >> in particular that signal propagates to a "cp" command that the
> >> archiver process is executing.  The "cp" is unsurprisingly running
> >> with default SIGQUIT handling, which per the signal man page
> >> includes dumping core.
> > 
> > We've experienced this (core dump in the data directory by an
> > archive command) years ago.  Related to this, the example of using
> > cp in the PostgreSQL manual is misleading, because cp doesn't
> > reliably persist the WAL archive file.
> 
> The previous talks about having pg_copy are still where they were a
> couple of years ago as we did not agree on which semantics it should
> have.  If we could move forward with that and update the documentation
> from its insanity that would be great and...  The signal handling is
> something else we could customize in a more favorable way with the
> archiver.  Anyway, switching from something else than SIGQUIT to stop
> the archiver will not prevent any other tools from generating core
> dumps with this other signal.

Since we are allowing OPs to use arbitrary command as
archive_command, providing a replacement with non-standard signal
handling for a specific command doesn't seem a general solution
to me. Couldn't we have pg_system(a tentative name), which
intercepts SIGQUIT then sends SIGINT to children? Might be need
to resend SIGQUIT after some interval, though..

> > We enable the core dump in production to help the investigation just in 
> > case.
> 
> So do I in some of the stuff I work on.
> 
> > some_command also catches SIGQUIT just exit.  It copies and syncs the file.
> > 
> > I proposed something in this line as below, but I couldn't respond to 
> > Peter's review comments due to other tasks.  Does anyone think it's worth 
> > resuming this?
> > 
> > https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau
> 
> And I was looking for this thread a couple of lines ago :)
> Thanks.

# Is there any means to view the whole of a thread from archive?
# I'm a kind of reluctant to wander among messages like a rat in
# a maze:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 08:06:22AM +, r.takahash...@fujitsu.com wrote:
> I'm not sure whether this patch should be applied to postgres below 11
> since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the 
> same error.

Thinking wider, don't we have the same problem with wal_sender_timeout
in the case where a sync request takes longer than the time it would
take the backend to terminate the connection?
--
Michael


signature.asc
Description: PGP signature


Re: Proposal: roll pg_stat_statements into core

2019-09-02 Thread Adrien Nayrat
On 9/1/19 8:54 PM, Tom Lane wrote:
>> - The overhead for most use cases is low compared to the benefit.
> Please do not expect that we're going to accept such assertions
> unsupported by evidence.  (As a very quick-n-dirty test, I tried
> "pgbench -S" and got somewhere around 4% TPS degradation with
> pg_stat_statements loaded.  That doesn't seem negligible.)

AFAIR Andres pointed overhead could be much more when you have more queries than
pg_stat_statements.max [1]. Eviction can be costly.

1: https://twitter.com/AndresFreundTec/status/1105585237772263424





signature.asc
Description: OpenPGP digital signature


RE: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread r.takahash...@fujitsu.com
Hi Michael-san,


> Attached is a patch to do that, which should go down to v12 where
> tcp_user_timeout has been introduced.  Takahashi-san, what do you
> think?

Thank you for creating the patch.
This patch is what I expected.

I'm not sure whether this patch should be applied to postgres below 11
since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same 
error.

Regards,
Ryohei Takahashi





Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Michael Paquier
On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > I tried 1. and it shown index_rebuild_count, but it also shown
> > "initializing" phase again and other columns were empty. So, it is
> > not suitable to fix the problem. :(
> > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > time to do that after PGConf.Asia 2019.
> > If we selected 3., it affects following these progress reporting
> > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > I suppose. Any comments welcome! :)
> 
> I looked at this open item. I prefer #3 but I think it would be enough
> to have a small stack using a fixed length array to track nested
> progress information and the current executing command (maybe 4 or 8
> would be enough for maximum nested level for now?). That way, we don't
> need any change the interfaces. AFAICS there is no case where we call
> only either pgstat_progress_start_command or
> pgstat_progress_end_command without each other (although
> pgstat_progress_update_param is called without start). So I think that
> having a small stack for tracking multiple reports would work.
> 
> Attached the draft patch that fixes this issue. Please review it.

Do we actually want to show to the user information about CREATE INDEX
which is different than CLUSTER?  It could be confusing for the user
to see a progress report from a command different than the one
actually launched.  There could be a method 4 here: do not start a new
command progress when there is another one already started, and do not
try to end it in the code path where it could not be started as it did
not stack.  So while I see the advantages of stacking the progress
records as you do when doing cascading calls of the progress
reporting, I am not sure that:
1) We should bloat more PgBackendStatus for that.
2) We want to add more complication in this area close to the
release of 12.

Another solution as mentioned by Yamada-san is just to start again the
progress for the cluster command in reindex_relation() however you got
an issue here as reindex_relation() is also called by REINDEX TABLE. I
find actually very weird that we have a cluster-related field added
for REINDEX, so it seems to me that all the interactions between the
code paths of CLUSTER and REINDEX have not been completely thought
through.  This part has been added by 6f97457 :(
--
Michael


signature.asc
Description: PGP signature


Re: block-level incremental backup

2019-09-02 Thread Jeevan Ladhe
Hi Robert,

On Sat, Aug 31, 2019 at 8:29 AM Robert Haas  wrote:

> On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
>  wrote:
> > Due to the inherent nature of pg_basebackup, the incremental backup also
> > allows taking backup in tar and compressed format. But, pg_combinebackup
> > does not understand how to restore this. I think we should either make
> > pg_combinebackup support restoration of tar incremental backup or
> restrict
> > taking the incremental backup in tar format until pg_combinebackup
> > supports the restoration by making option '--lsn' and '-Ft' exclusive.
> >
> > It is arguable that one can take the incremental backup in tar format,
> extract
> > that manually and then give the resultant directory as input to the
> > pg_combinebackup, but I think that kills the purpose of having
> > pg_combinebackup utility.
>
> I don't agree. You're right that you would have to untar (and
> uncompress) the backup to run pg_combinebackup, but you would also
> have to do that to restore a non-incremental backup, so it doesn't
> seem much different.
>

Thanks. Yes I agree about the similarity between restoring non-incremental
and incremental backup in this case.


>  I don't think it's worth doing that at this point; I definitely don't
> think it
> needs to be part of the first patch.
>

Makes sense.

Regards,
Jeevan Ladhe


Re: Bug in GiST paring heap comparator

2019-09-02 Thread Heikki Linnakangas

On 02/09/2019 07:54, Alexander Korotkov wrote:

Andrey Borodin noticed me that results of some KNN-GIST tests are
obviously wrong and don't match results of sort node.

SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
 f1
---
  (10,10)
  (NaN,NaN)
  (0,0)
  (1e-300,-1e-300)
  (-3,4)
  (-10,0)
  (-5,-12)
  (5.1,34.5)

  (1e+300,Infinity)
(10 rows)


So we've memorized incorrect results in the expected output. Ouch!


It appears to be related to implementation of comparison function in
pairing heap used as priority queue for KNN.  It used plain float
comparison, which doesn't handle Inf and Nan values well.  Attached
patch replaced it with float8_cmp_internal().

Also, note that with patch KNN results still don't fully match results
of sort node.

SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
 f1
---
  (0,0)
  (1e-300,-1e-300)
  (-3,4)
  (-10,0)
  (10,10)
  (-5,-12)
  (5.1,34.5)
  (1e+300,Infinity)

  (NaN,NaN)
(10 rows)

NULL and '(NaN,NaN)' are swapped.  It happens so, because we assume
distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
be greater than NULL.  If even we would assume distance to NULL to be
Nan, it doesn't guarantee that NULLs would be last.  It looks like we
can handle this only by introduction array of "distance is null" flags
to GISTSearchItem.  But does it worth it?


I don't think we have much choice, returning wrong results is not an 
option. At first I thought we could set the "recheck" flag if we see 
NULLs or NaNs, but that won't work because rechecking is not supported 
with Index-Only Scans.


Looking at the corresponding SP-GiST code, 
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest 
copy-pasting the implementation from there, so that they would be as 
similar as possible. (SP-GiST gets away with just one isNull flag, 
because it doesn't support multi-column indexes. In GiST, you need an 
array, as you said.)


- Heikki