Re: GetSnapshotData round two(for me)

2018-09-24 Thread Dilip Kumar
On Tue, Sep 25, 2018 at 11:00 AM, Daniel Wood  wrote:
> I was about to suggest creating a single shared snapshot instead of having
> multiple backends compute what is essentially the same snapshot.  Luckily,
> before posting, I discovered Avoiding repeated snapshot computation from
> Pavan and POC: Cache data in GetSnapshotData() from Andres.
>
I think Mithun has also worked on this[1] and posted some analysis
about which cases he has seen improvement and also the cases where it
did not perform well, you might want to have a look.

[1] 
https://www.postgresql.org/message-id/CAD__OujRZEjE5y3vfmmZmSSr3oYGZSHRxwDwF7kyhBHB2BpW_g%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



GetSnapshotData round two(for me)

2018-09-24 Thread Daniel Wood
I was about to suggest creating a single shared snapshot instead of having 
multiple backends compute what is essentially the same snapshot.  Luckily, 
before posting, I discovered Avoiding repeated snapshot computation 
https://www.postgresql.org/message-id/caboikdmsj4osxta7xbv2quhkyuo_4105fjf4n+uyroybazs...@mail.gmail.com
  from Pavan and POC: Cache data in GetSnapshotData() 
https://www.postgresql.org/message-id/20150202152706.gd9...@alap3.anarazel.de 
from Andres.


Andres, could I get a short summary of the biggest drawback that may have 
prevented this from being released?  Before I saw this I had did my own 
implementation and saw some promising results(25% on 48 cores).  I do need to 
do some mixed RO and RW workloads to see how the invalidations of the shared 
copy, at EOT time, affect the results.  There are some differences in my 
implementation.  I choose, perhaps incorrectly?, to busy spin other users 
trying to get a snapshot while the first guy in builds the shared copy.  My 
thinking is to not increase latency of using the snapshot.  The improvement of 
the idea doesn't come from getting off the CPU, by using a WAIT, but in not 
reading PGXACT cache lines on all the cpus acquiring the snapshot that are 
constantly being dirtied.  One backend can do the heavy lifting and the others 
can immediately jump on the shared copy once created.


And something else quite weird:  As I was evolving a standard setup for 
benchmark runs and getting baselines I was getting horrible numbers 
sometimes(680K) and other times I'd get over 1 million QPS.  I was thinking I 
had a bad machine.  What I found was that even though I was running a fixed 192 
clients I had set max_connections to 600 sometimes and 1000 on other runs.  
Here is what I see running select-only scale 1000 pgbench with 192 clients on a 
48 core box(2 sockets) using different values for max_connections:


 200 tps = 1092043
 250 tps = 1149490
 300 tps =  732080
 350 tps =  719611
 400 tps =  681170
 450 tps =  687527
 500 tps =  859978
 550 tps =  927161
 600 tps = 1092283
 650 tps = 1154916
 700 tps = 1237271
 750 tps = 1195968
 800 tps = 1162221
 850 tps = 1140626
 900 tps =  749519
 950 tps =  648398
1000 tps =  653460


This is on the base 12x codeline.  The only thought I've had so far is that 
each PGXACT in use(192) is being scattered across the full set of 
max_connections, instead of being physically contiguous in the first 192 slots. 
 This would cause more cache lines to be scanned.  It doesn't make a lot of 
sense given that it goes up back again from 500 peaking at 700.  Also this is 
after a fresh restart so the proc's in the freelist shouldn't have been 
scrambled yet in terms of ordering.


NOTE: I believe you'll only see this huge difference on a dual socket machine.  
It'd probably only take 30 minutes or so on a big machine to confirm with a 
couple of few minute runs at different values for max_connections.  I'll be 
debugging this soon.  But I've been postponing it while experimenting with my 
shared snapshot code.






Re: SSL tests failing with "ee key too small" error on Debian SID

2018-09-24 Thread Michael Paquier
On Tue, Sep 25, 2018 at 12:48:57PM +0900, Kyotaro HORIGUCHI wrote:
> Do you mean that cert/key files are generated on-the-fly while
> running 'make check'?  It sounds reasonable as long as just
> replaceing existing files with those with longer (2048bits?) keys
> doesn't work for all supported platforms.

The files are present by default in the tree, but can be regenerated
easily by using the makefile rule "sslfiles".  From what I can see, this
is caused by OpenSSL 1.1.1 which Debian SID has visibly upgraded to
recently.  That's the version I have on my system.  I have not dug much
into the Makefile to see if things could get done right and change the
openssl commands though..
--
Michael


signature.asc
Description: PGP signature


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-09-24 Thread Michael Paquier
On Tue, Sep 25, 2018 at 01:49:09PM +1000, Haribabu Kommi wrote:
> Thanks for the review.
> Fixed in the attached patch as per your suggestion.

Hmm.  I see a problem with the tests and the stability of what
pg_stat_statements_reset() can return.  Normally installcheck is
disabled in contrib/pg_stat_statements/Makefile but if you remove this
barrier and run the tests with a server loading the module in
shared_preload_libraries then things are not stable.  We don't have this
kind of instability on HEAD.  Some call to pg_stat_statements_reset()
system-wide is visibly missing.

+   if (!pgss || !pgss_hash)
+   ereport(ERROR,
+   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("pg_stat_statements must be loaded via 
shared_preload_libraries")));
This check can be within entry_reset().

+  the specified userid, dbid and queryid. Returns the total number of
+  statement statistics that are reset based on the specified input.
+  If any of the parameter is not specified, the default value NULL(invalid)
Missing some markups for the three field names here, as well as for NULL
which is a value.

I can buy the compatibility breakage with the return result of
pg_stat_statements_reset when specified without arguments.

Some nannyism: If all entries are removed and a new file needs to be
written, you could save a bit of indentation by returning immediately
when (num_entries != num_remove).
--
Michael


signature.asc
Description: PGP signature


PG vs macOS Mojave

2018-09-24 Thread Tom Lane
Well, macOS 10.14 (Mojave) is out, so I installed it on a spare machine,
and naturally the first thing I tried was to build PG with it.  Our
core code seems fine, but:

* --with-perl fails in configure, complaining that it can't find perl.h.

* --with-tcl fails in configure, complaining that it can't find
tclConfig.sh.  Furthermore, the historical workaround for that
(--with-tclconfig=/System/Library/Frameworks/Tcl.framework) doesn't fix it.

After some investigation, it seems that Apple has been busy moving
header files (not libraries) under SDK-specific "sysroot" directories,
with the expectation that you'd compile using "-isysroot $SYSROOT".
There's some mention of that here, for example:

https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/cross_development/Configuring/configuring.html#//apple_ref/doc/uid/1163i-CH1-SW7

The sysroot seems to contain only headers; stuff you need at runtime,
such as shared libraries, is still where it used to be.

The recommended way to get the appropriate sysroot path seems to be

SYSROOT=`xcodebuild -version -sdk macosx Path`

Attached is a draft patch to fix things up.  The core ideas are

(1) Stop assuming that the Perl headers and library are necessarily
in the same place; create a perl_includedir variable to represent the
path to the former.

(2) Tweak src/template/darwin to inject the appropriate -isysroot
option into CPPFLAGS.

(3) Remove the need to manually specify the path to tclConfig.sh,
which has gotten even more painful than before because now it's
somewhere under the sysroot.  You can still specify --with-tclconfig
if you really want to, but it's not necessary anymore to build pltcl
under recent macOS.

Note that (3) alone is not sufficient to fix pltcl; we must do (2)
as well because tclConfig.sh now reports the Tcl include flags as
TCL_INCLUDE_SPEC= -iwithsysroot 
/System/Library/Frameworks/Tcl.framework/Versions/8.5/Headers
so unless we also set -isysroot this doesn't work.

It's a bit scary to be adding -isysroot globally.  I thought
briefly about using it only while building pltcl, but that seems
even more dangerous: if there were any discrepancies between the
headers in the sysroot and those in the normal include directories,
building pltcl with different headers from the rest of the system
would surely be disastrous.  In any case, I suspect that the handwriting
is on the wall, and before very much longer it's going to be impossible
to build meaningful code on macOS without -isysroot anyway.

I've tested this on all the macOS versions I have at hand, and it
doesn't seem to break anything.  Only part (1) could possibly
affect other platforms, and that seems safe enough.

I'd like to commit and backpatch this, because otherwise longfin
is going to start falling over when I upgrade its host to Mojave.

Thoughts?

regards, tom lane

diff --git a/configure b/configure
index 21ecd29..879eee4 100755
--- a/configure
+++ b/configure
@@ -668,6 +668,7 @@ python_majorversion
 PYTHON
 perl_embed_ldflags
 perl_embed_ccflags
+perl_includedir
 perl_useshrplib
 perl_privlibexp
 perl_archlibexp
@@ -9773,6 +9774,14 @@ You might have to rebuild your Perl installation.  Refer to the
 documentation for details.  Use --without-perl to disable building
 PL/Perl." "$LINENO" 5
   fi
+  # On most platforms, archlibexp is also where the Perl include files live ...
+  perl_includedir="$perl_archlibexp"
+  # ... but on some macOS versions, we must look under $PG_SYSROOT instead
+  if test x"$PG_SYSROOT" != x"" ; then
+if test -d "$PG_SYSROOT$perl_archlibexp" ; then
+  perl_includedir="$PG_SYSROOT$perl_archlibexp"
+fi
+  fi
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for CFLAGS recommended by Perl" >&5
 $as_echo_n "checking for CFLAGS recommended by Perl... " >&6; }
@@ -18355,7 +18364,7 @@ fi
 # check for 
 if test "$with_perl" = yes; then
   ac_save_CPPFLAGS=$CPPFLAGS
-  CPPFLAGS="$CPPFLAGS -I$perl_archlibexp/CORE"
+  CPPFLAGS="$CPPFLAGS -I$perl_includedir/CORE"
   ac_fn_c_check_header_compile "$LINENO" "perl.h" "ac_cv_header_perl_h" "#include 
 "
 if test "x$ac_cv_header_perl_h" = xyes; then :
diff --git a/configure.in b/configure.in
index 8fe6894..530f275 100644
--- a/configure.in
+++ b/configure.in
@@ -1044,6 +1044,15 @@ You might have to rebuild your Perl installation.  Refer to the
 documentation for details.  Use --without-perl to disable building
 PL/Perl.])
   fi
+  # On most platforms, archlibexp is also where the Perl include files live ...
+  perl_includedir="$perl_archlibexp"
+  # ... but on some macOS versions, we must look under $PG_SYSROOT instead
+  if test x"$PG_SYSROOT" != x"" ; then
+if test -d "$PG_SYSROOT$perl_archlibexp" ; then
+  perl_includedir="$PG_SYSROOT$perl_archlibexp"
+fi
+  fi
+  AC_SUBST(perl_includedir)dnl
   PGAC_CHECK_PERL_EMBED_CCFLAGS
   PGAC_CHECK_PERL_EMBED_LDFLAGS
 fi
@@ -2229,7 +2238,7 @@ fi
 # check for 
 if test "$with_perl" = yes; then
 

Re: when set track_commit_timestamp on, database system abort startup

2018-09-24 Thread Michael Paquier
Sawada-san,

On Mon, Sep 24, 2018 at 08:28:45AM +0900, Michael Paquier wrote:
> Wouldn't it be better to incorporate the new test as part of
> 004_restart.pl?  This way, we avoid initializing a full instance, which
> is always a good thing as that's very costly.  The top of this file also
> mentions that it tests clean restarts, but it bothers also about crash
> recovery.

I have been able to work on this bug, and rewrote the proposed test case
as attached.  The test can only get on v11 and HEAD.  What do you think?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4754e75436..ddc999b687 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6833,11 +6833,13 @@ StartupXLOG(void)
 	StartupMultiXact();
 
 	/*
-	 * Ditto commit timestamps.  In a standby, we do it if setting is enabled
-	 * in ControlFile; in a master we base the decision on the GUC itself.
+	 * Ditto for commit timestamps.  Activate the facility if the setting
+	 * is enabled in the control file, as there should be no tracking of
+	 * commit timestamps done when the setting was disabled.  This facility
+	 * can be started or stopped when replaying a XLOG_PARAMETER_CHANGE
+	 * record.
 	 */
-	if (ArchiveRecoveryRequested ?
-		ControlFile->track_commit_timestamp : track_commit_timestamp)
+	if (ControlFile->track_commit_timestamp)
 		StartupCommitTs();
 
 	/*
diff --git a/src/test/modules/commit_ts/t/004_restart.pl b/src/test/modules/commit_ts/t/004_restart.pl
index daf42d3a02..4efe30a559 100644
--- a/src/test/modules/commit_ts/t/004_restart.pl
+++ b/src/test/modules/commit_ts/t/004_restart.pl
@@ -1,4 +1,4 @@
-# Testing of commit timestamps preservation across clean restarts
+# Testing of commit timestamps preservation across restarts
 use strict;
 use warnings;
 use PostgresNode;
@@ -71,12 +71,36 @@ is($after_restart_ts, $before_restart_ts,
 	'timestamps before and after restart are equal');
 
 # Now disable commit timestamps
-
 $node_master->append_conf('postgresql.conf', 'track_commit_timestamp = off');
-
 $node_master->stop('fast');
+
+# Start the server, which generates a XLOG_PARAMETER_CHANGE record where
+# the parameter change is registered.
 $node_master->start;
 
+# Now restart again the server so as no record XLOG_PARAMETER_CHANGE are
+# replayed with the follow-up immediate shutdown.
+$node_master->restart;
+
+# Move commit timestamps across page boundaries.  Things should still
+# be able to work across restarts with those transactions committed while
+# track_commit_timestamp is disabled.
+$node_master->safe_psql('postgres',
+qq(CREATE PROCEDURE consume_xid(cnt int)
+AS \$\$
+DECLARE
+i int;
+BEGIN
+FOR i in 1..cnt LOOP
+EXECUTE 'SELECT txid_current()';
+COMMIT;
+END LOOP;
+END;
+\$\$
+LANGUAGE plpgsql;
+));
+$node_master->safe_psql('postgres', 'CALL consume_xid(2000)');
+
 ($ret, $stdout, $stderr) = $node_master->psql('postgres',
 	qq[SELECT pg_xact_commit_timestamp('$xid');]);
 is($ret, 3, 'no commit timestamp from enable tx when cts disabled');
@@ -106,10 +130,12 @@ like(
 # Re-enable, restart and ensure we can still get the old timestamps
 $node_master->append_conf('postgresql.conf', 'track_commit_timestamp = on');
 
-$node_master->stop('fast');
+# An immediate shutdown is used here.  At next startup recovery will
+# replay transactions which committed when track_commit_timestamp was
+# disabled, and the facility should be able to work properly.
+$node_master->stop('immediate');
 $node_master->start;
 
-
 my $after_enable_ts = $node_master->safe_psql('postgres',
 	qq[SELECT pg_xact_commit_timestamp('$xid');]);
 is($after_enable_ts, '', 'timestamp of enabled tx null after re-enable');


signature.asc
Description: PGP signature


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2018-09-24 Thread Andrew Gierth
So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

(formatted for emacs outline-mode)

* 1. use lexical lookahead

  +: relatively straightforward parser changes
  +: no new reserved words
  +: has the option of working extensibly with all functions

  -: base_yylex needs extending to 3 lookahead tokens

** 1.1. Allow from/ignore clause on all (or all non-agg) window function calls

  If the clauses are legal on all window functions, what to do about existing
  window functions for which the clauses do not make sense?

*** 1.1.1. Ignore the clause when the function isn't aware of it

  +: simple
  -: somewhat surprising for users perhaps?

*** 1.1.2. Change the behavior of the windowapi in some consistent way

  Not sure if this can work.
  +: fairly simple(maybe?) and predictable
  -: changes the behavior of existing window functions

** 1.2. Allow from/ignore clause on only certain functions

  +: avoids any unexpected behavior
  -: needs some way to control what functions allow it

*** 1.2.1. Check the function name in parse analysis against a fixed list.

  +: simple
  -: not extensible

*** 1.2.2. Provide some option in CREATE FUNCTION

  +: extensible
  -: fairly intrusive, adding stuff to create function and pg_proc

*** 1.2.3. Do something magical with function argument types

  +: doesn't need changes in create function / pg_proc
  -: it's an ugly hack

* 2. reserve nth_value etc. as functions

  +: follows the spec reasonably well
  +: less of a hack than extending base_yylex

  -: new reserved words
  -: more parser rules
  -: not extensible

  (now goto 1.2.1)

* 3. "just say no" to the spec

  e.g. add new functions like lead_ignore_nulls(), or add extra boolean
  args to lead() etc. telling them to skip nulls

  +: simple
  -: doesn't conform to spec
  -: using extra args isn't quite the right semantics

-- 
Andrew (irc:RhodiumToad)



Re: SSL tests failing with "ee key too small" error on Debian SID

2018-09-24 Thread Kyotaro HORIGUCHI
Hello.

At Mon, 17 Sep 2018 22:13:40 +0900, Michael Paquier  wrote 
in <20180917131340.ge31...@paquier.xyz>
> Hi all,
> 
> On a rather freshly-updated Debian SID server, I am able to see failures
> for the SSL TAP tests:
> 2018-09-17 22:00:27.389 JST [13072] LOG:  database system is shut down
> 2018-09-17 22:00:27.506 JST [13082] FATAL:  could not load server
> certificate file "server-cn-only.crt": ee key too small
> 2018-09-17 22:00:27.506 JST [13082] LOG:  database system is shut down
> 2018-09-17 22:00:27.720 JST [13084] FATAL:  could not load server
> certificate file "server-cn-only.crt": ee key too small
> 
> Wouldn't it be better to rework the rules used to generate the different
> certificates and reissue them in the tree?  It seems to me that this is
> just waiting to fail in other platforms as well..

I agree that we could get into the same trouble sooner or later.

Do you mean that cert/key files are generated on-the-fly while
running 'make check'? It sounds reasonable as long as just
replaceing existing files with those with longer (2048bits?) keys
doesn't work for all supported platforms.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-09-24 Thread Haribabu Kommi
On Tue, Sep 25, 2018 at 1:39 AM Michael Paquier  wrote:

> On Mon, Sep 24, 2018 at 12:19:44PM +1000, Haribabu Kommi wrote:
> > Attached new rebased version of the patch that enhances the
> > pg_stat_statements_reset()
> > function. This needs to be applied on top of the patch that is posted in
> > [1].
>
> +CREATE ROLE stats_regress_user1;
> +CREATE ROLE stats_regress_user2;
> Just a short note: regression tests creating roles should use regress_
> as prefix.
>

Thanks for the review.
Fixed in the attached patch as per your suggestion.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-pg_stat_statements_reset-to-reset-specific-query-use_v6.patch
Description: Binary data


Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role

2018-09-24 Thread Haribabu Kommi
On Tue, Sep 25, 2018 at 10:58 AM Michael Paquier 
wrote:

> On Mon, Sep 24, 2018 at 12:02:35PM -0400, Tom Lane wrote:
> > For v10 and up, the method used in 53b79ab4 is overcomplicated: you only
> > need to add a delta script not a new base script.  (If you had to
> > back-patch before v10, it might be best to add a new base script in all
> > the branches just to keep the patches consistent; but IIUC this issue
> only
> > arises in v10 and up.)  I'd consider following, eg, 7f563c09f as a
> > prototype instead.
>
> Of course, thanks.  Sorry for the incorrect reference pointing to a
> commit of REL9_6_STABLE.  As the patch only needs to be applied down to
> v10, there is no need to do anything more complicated than what Hari has
> proposed.  So, committed after a bit of comment and format tweaks.
>

Thanks for the changes and commit.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Missing const in DSA.

2018-09-24 Thread Thomas Munro
On Tue, Sep 25, 2018 at 7:46 AM Thomas Munro
 wrote:
> On Tue, Sep 25, 2018 at 4:17 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > On Mon, Sep 24, 2018 at 9:32 AM Tom Lane  wrote:
> > >> Mark G  writes:
> > >>> While looking at some of the recent churn in DSA I noticed that
> > >>> dsa_size_class_map should probably be declared const.
> >
> > >> +1 ... also, given the contents of the array, "char" seems like
> > >> rather a misnomer.  I'd be happier if it were declared as uint8, say.

Pushed.  Thanks both for the code review!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 10:39:40PM -0400, Tom Lane wrote:
> I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest.

Same question here.  As that's kind of a separate discussion, I left it
out.  Now I don't mind changing that at the same time as that's
harmless, and as that's only a patch for HEAD.  PGDLLIMPORT changes
don't get back-patched as well...
--
Michael


signature.asc
Description: PGP signature


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> Let's change it then.  ClientConnectionLost needs also to be changed as
> miscadmin.h tells that it could be used in a signal handler. What do you
> think about the attached?

Looks reasonable to me (I've not tested though).

I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest.

regards, tom lane



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 09:38:11PM -0400, Tom Lane wrote:
> Yeah, in principle any global variable touched by a signal handler should
> be sig_atomic_t.  I don't know of any modern platform where using "bool"
> is unsafe, but per the C standard it could be.  The case that would be
> worrisome is if setting the variable requires a load/modify/store, which
> does apply to char-sized variables on some ancient platforms.  I think
> there's no need to worry for int-sized variables.

Let's change it then.  ClientConnectionLost needs also to be changed as
miscadmin.h tells that it could be used in a signal handler. What do you
think about the attached?
--
Michael
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f7d6617a13..5971310aab 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -27,11 +27,11 @@
 
 ProtocolVersion FrontendProtocol;
 
-volatile bool InterruptPending = false;
-volatile bool QueryCancelPending = false;
-volatile bool ProcDiePending = false;
-volatile bool ClientConnectionLost = false;
-volatile bool IdleInTransactionSessionTimeoutPending = false;
+volatile sig_atomic_t InterruptPending = false;
+volatile sig_atomic_t QueryCancelPending = false;
+volatile sig_atomic_t ProcDiePending = false;
+volatile sig_atomic_t ClientConnectionLost = false;
+volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e167ee8fcb..8ff106e051 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -77,13 +77,13 @@
 
 /* in globals.c */
 /* these are marked volatile because they are set by signal handlers: */
-extern PGDLLIMPORT volatile bool InterruptPending;
-extern PGDLLIMPORT volatile bool QueryCancelPending;
-extern PGDLLIMPORT volatile bool ProcDiePending;
-extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t InterruptPending;
+extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending;
+extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending;
+extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 
-extern volatile bool ClientConnectionLost;
+extern volatile sig_atomic_t ClientConnectionLost;
 
 /* these are marked volatile because they are examined by signal handlers: */
 extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;


signature.asc
Description: PGP signature


Re: DNS SRV support for LDAP authentication

2018-09-24 Thread Thomas Munro
On Tue, Sep 25, 2018 at 2:09 PM Thomas Munro
 wrote:
> 2.  Define a new zone for testing, by adding the following to the end
> 3.  Create that zone file in /usr/local/etc/namedb/master/my.test.domain:

Oops, I changed my testing domain name in the middle of my experiment,
but pasted the older version into the previous message.  Here are the
corrected steps 2 and 3, consistent with the rest:

= end of /usr/local/etc/namedb/named.conf =
zone "my-domain.com" {
type master;
file "/usr/local/etc/namedb/master/my-domain.com";
};
=

= /usr/local/etc/namedb/master/my-domain.com =
$TTL10
@   IN  SOA ns.my-domain.com. admin.my-domain.com. (
  2 ; Serial
 604800 ; Refresh
  86400 ; Retry
2419200 ; Expire
 604800 )   ; Negative Cache TTL
IN  NS  ns.my-domain.com.
ns.my-domain.com.   IN  A   127.0.0.1
my-domain.com.  IN  A   127.0.0.1
ldap-server.my-domain.com.  IN  A   127.0.0.1
_ldap._tcp.my-domain.com.   IN  SRV 0   0   389
 ldap-server
=

-- 
Thomas Munro
http://www.enterprisedb.com



RE: Changing the setting of wal_sender_timeout per standby

2018-09-24 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
> Okay, I have pushed the patch with all your suggestions included.

Thanks so much!

Regards
Takayuki Tsunakawa





DNS SRV support for LDAP authentication

2018-09-24 Thread Thomas Munro
Hello hackers,

Some people like to use DNS SRV records to advertise LDAP servers on
their network.  Microsoft Active Directory is usually (always?) set up
that way.  Here is a patch to allow our LDAP auth module to support
that kind of discovery.  It copies the convention of the OpenLDAP
command line tools: if you give it a URL that has no hostname, it'll
try to extract a domain name from the bind DN, and then ask your DNS
server for a SRV record for LDAP-over-TCP at that domain.  The
OpenLDAP version of libldap.so exports the magic to do that, so the
patch is very small (but the infrastructure set-up to test it is a bit
of a schlep, see below).  I'll add this to the next Commitfest.

Testing instructions for (paths and commands given for FreeBSD, adjust
as appropriate):

1.  Install BIND:

$ sudo pkg install bind99

2.  Define a new zone for testing, by adding the following to the end
of /usr/local/etc/namedb/named.conf:

= 8< =
zone "my.test.domain" {
type master;
file "/usr/local/etc/namedb/master/my.test.domain";
};
= 8< =


3.  Create that zone file in /usr/local/etc/namedb/master/my.test.domain:

= 8< =
$TTL10
@   IN  SOA ns.my.test.domain. admin.my.test.domain. (
  2 ; Serial
 604800 ; Refresh
  86400 ; Retry
2419200 ; Expire
 604800 )   ; Negative Cache TTL
IN  NS  ns.my.test.domain.
ns.my.test.domain.  IN  A   127.0.0.1
my.test.domain. IN  A   127.0.0.1
ldap-server.my.test.domain. IN  A   127.0.0.1
_ldap._tcp.my.test.domain.  IN  SRV 0   0   389
 ldap-server
= 8< =

4.  Start up bind:

# service named onestart

5.  Confirm that SRV lookups find our record:

$ dig @localhost _ldap._tcp.my-domain.com SRV
...
;; ANSWER SECTION:
_ldap._tcp.my-domain.com. 10IN  SRV 0 0 389
ldap-server.my-domain.com.

6.  Tell your system libraries to use this DNS server by temporarily
changing /etc/resolv.conf to say:

= 8< =
nameserver 127.0.0.1
= 8< =

7.  Confirm that the OpenLDAP tools can look that SRV record up:

$ ldapsearch -H 'ldap:///ou%3Dblah%2Cdc%3Dmy-domain%2Cdc%3Dcom'

(That's "ou=blah,dc=my-domain,dc=com" URL-encoded, from which
"my-domain.com" will be extracted.)  You should see that it's trying
to connect to ldap-server port 389, and you can stick 'x' on the end
of it to see what it looks like when it can't find a SRV record, as a
sanity check:

$ ldapsearch -H 'ldap:///ou%3Dblah%2Cdc%3Dmy-domain%2Cdc%3Dcomx'
DNS SRV: Could not turn domain=my-domain.comx into a hostlist

8.  Set up an LDAP server listening on localhost port 389, and create
a user, such that you can actually authenticate from PostgreSQL with
it.  Gory details omitted.  First test that you can log in with LDAP
authentication when using a pg_hba.conf line like this:

host all fred 127.0.0.1/32 ldap
ldapurl="ldap://ldap-server.my-domain.com/dc=my-domain,dc=com?cn?sub;

9.  Next apply the patch and verify that you can take out the hostname
and let it be discovered via DNS SRV:

host all fred 127.0.0.1/32 ldap ldapurl="ldap:///dc=my-domain,dc=com?cn?sub;

(You can stick some elog(LOG, ...) lines into
InitializeLDAPConnection() if you want to check that
ldap_domain2hostlist() is in fact finding the hostname and port.)

This is a first draft.  Not tested much yet.  I wonder if
HAVE_LDAP_INITIALIZE is a reasonable way to detact OpenLDAP.  The
documentation was written in about 7 seconds so probably needs work.
There is probably a Windowsy way to do this too but I didn't look into
that.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-DNS-SRV-support-for-LDAP-server-discovery.patch
Description: Binary data


Re: [patch] Bug in pg_dump/pg_restore using --no-publication

2018-09-24 Thread Michael Paquier
On Fri, Sep 21, 2018 at 05:44:02PM +0200, Gilles Darold wrote:
> Attached is a patch that fixes a bug in pg_dump since 10.0 and
> reproducible in master. When using option --no-publication : ALTER
> PUBLICATION orders are still present in the dump.

Thanks for the report, the patch, and the test case, Gilles!

> pg_restore --no-publication test.dump -l test.dump| grep PUBLICATION
> 2230; 6106 16389 PUBLICATION TABLE public p1 t1
> 2231; 6106 16392 PUBLICATION TABLE public p_insert_only t1

This command does not actually work ;)

> Should I add it to current commitfest ?

No need to.  I have committed your patch down to v10 after making sure
that we are not missing any other spots, and testing each branch
manually.
--
Michael


signature.asc
Description: PGP signature


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> At the same time, all the pending flags in miscadmin.h could be switched
> to sig_atomic_t if we were to be correct, no?  The counters could be
> higher than 256 so that's not really possible. 

Yeah, in principle any global variable touched by a signal handler should
be sig_atomic_t.  I don't know of any modern platform where using "bool"
is unsafe, but per the C standard it could be.  The case that would be
worrisome is if setting the variable requires a load/modify/store, which
does apply to char-sized variables on some ancient platforms.  I think
there's no need to worry for int-sized variables.

regards, tom lane



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 09:03:49PM -0400, Tom Lane wrote:
> You could only fix that by blocking all signal handling during the
> handler, which would be expensive and rather pointless.
> 
> I do not think that it's readily possible to improve on the current
> situation with one sig_atomic_t per flag.

Okay, thanks for the confirmation.

At the same time, all the pending flags in miscadmin.h could be switched
to sig_atomic_t if we were to be correct, no?  The counters could be
higher than 256 so that's not really possible. 
--
Michael


signature.asc
Description: PGP signature


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> And then within separate signal handlers things like:
> void
> StatementCancelHandler(SIGNAL_ARGS)
> {
> [...]
> signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
> [...]
> }

AFAICS this still wouldn't work.  The machine code is still going to
look (on many machines) like "load from signalPendingFlags,
OR in some bits, store to signalPendingFlags".  So there's still a
window for another signal handler to interrupt that and store some
bits that will get lost.

You could only fix that by blocking all signal handling during the
handler, which would be expensive and rather pointless.

I do not think that it's readily possible to improve on the current
situation with one sig_atomic_t per flag.

regards, tom lane



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 05:23:56PM -0700, Andres Freund wrote:
> On 2018-09-25 08:57:25 +0900, Michael Paquier wrote:
>> Anyway, putting the back-patching pain aside, and just for my own
>> knowledge...  Andres, would it be fine to just use one sig_atomic_t
>> field which can be set from different code paths?  Say:
>> typedef enum SignalPendingType {
>> PENDING_INTERRUPT,
>> PENDING_CANCEL_QUERY,
>> PENDING_PROC_DIE,
>> PENDING_RELOAD,
>> PENDING_SESSION_TIMEOUT
>> };
> 
> Well, they'd have to different bits...

Sure, I forgot to write the "foo = 1 << 0" and such ;)

>> extern volatile sig_atomic_t signalPendingFlags;
> 
> Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit
> wide - so you couldn't have that many flags.

I see.  I thought that it mapped at least  int, but C99 tells that it
can be as small as char.  That's indeed quite limited...
--
Michael


signature.asc
Description: PGP signature


Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 12:02:35PM -0400, Tom Lane wrote:
> For v10 and up, the method used in 53b79ab4 is overcomplicated: you only
> need to add a delta script not a new base script.  (If you had to
> back-patch before v10, it might be best to add a new base script in all
> the branches just to keep the patches consistent; but IIUC this issue only
> arises in v10 and up.)  I'd consider following, eg, 7f563c09f as a
> prototype instead.

Of course, thanks.  Sorry for the incorrect reference pointing to a
commit of REL9_6_STABLE.  As the patch only needs to be applied down to
v10, there is no need to do anything more complicated than what Hari has
proposed.  So, committed after a bit of comment and format tweaks.
--
Michael


signature.asc
Description: PGP signature


Re: Implementing SQL ASSERTION

2018-09-24 Thread Andrew Gierth
> "Joe" == Joe Wildish  writes:

 Joe> Agreed. My assumption was that we would record in the data
 Joe> dictionary the behaviour (or “polarity") of each aggregate
 Joe> function with respect to the various operators. Column in
 Joe> pg_aggregate? I don’t know how we’d record it exactly.

I haven't looked at the background of this, but if what you want to know
is whether the aggregate function has the semantics of min() or max()
(and if so, which) then the place to look is pg_aggregate.aggsortop.

(For a given aggregate foo(x), the presence of an operator oid in
aggsortop means something like "foo(x) is equivalent to (select x from
... order by x using OP limit 1)", and the planner will replace the
aggregate by the applicable subquery if it thinks it'd be faster.)

As for operators, you can only make assumptions about their meaning if
the operator is a member of some opfamily that assigns it some
semantics. For example, the planner can assume that WHERE x=y AND x=1
implies that y=1 (assuming x and y are of appropriate types) not because
it assumes that "=" is the name of a transitive operator, but because
the operators actually selected for (x=1) and (x=y) are both "equality"
members of the same btree operator family. Likewise proving that (a>2)
implies (a>1) requires knowing that > is a btree comparison op.

-- 
Andrew (irc:RhodiumToad)



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Andres Freund
On 2018-09-25 08:57:25 +0900, Michael Paquier wrote:
> On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote:
> > This doesn't seem to solve an actual problem, why are we discussing
> > changing this? What'd be measurably improved, worth the cost of making
> > backpatching more painful?
> 
> My point was just to reduce the number of variables used and ease
> debugger lookups with what is on the stack.

I'm not sure a bitflag really gives you that - before gdb gives you the
plain value, afterwards you need to know the enum values and do bit math
to know.


> Anyway, putting the back-patching pain aside, and just for my own
> knowledge...  Andres, would it be fine to just use one sig_atomic_t
> field which can be set from different code paths?  Say:
> typedef enum SignalPendingType {
> PENDING_INTERRUPT,
> PENDING_CANCEL_QUERY,
> PENDING_PROC_DIE,
> PENDING_RELOAD,
> PENDING_SESSION_TIMEOUT
> };

Well, they'd have to different bits...


> extern volatile sig_atomic_t signalPendingFlags;

Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit
wide - so you couldn't have that many flags.


Greetings,

Andres Freund



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote:
> This doesn't seem to solve an actual problem, why are we discussing
> changing this? What'd be measurably improved, worth the cost of making
> backpatching more painful?

My point was just to reduce the number of variables used and ease
debugger lookups with what is on the stack.

Anyway, putting the back-patching pain aside, and just for my own
knowledge...  Andres, would it be fine to just use one sig_atomic_t
field which can be set from different code paths?  Say:
typedef enum SignalPendingType {
PENDING_INTERRUPT,
PENDING_CANCEL_QUERY,
PENDING_PROC_DIE,
PENDING_RELOAD,
PENDING_SESSION_TIMEOUT
};
extern volatile sig_atomic_t signalPendingFlags;

And then within separate signal handlers things like:
void
StatementCancelHandler(SIGNAL_ARGS)
{
[...]
signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
[...]
}

void
PostgresSigHupHandler(SIGNAL_ARGS)
{
[...]
signalPendingFlags |= ConfigReloadPending;
[...]
}
--
Michael


signature.asc
Description: PGP signature


Re[2]: Adding a note to protocol.sgml regarding CopyData

2018-09-24 Thread Bradley DeJong

Thanks for the feedback.

On 2018-09-22, Amit Kapila wrote ...
> ... Why can't we just extend the current Note where it is currently 
...


Because information about how the protocol works belongs in the protocol 
documentation not in the documentation for one implementation of the 
protocol.


Think of it this way, if the only full explanation of this information 
was in the psqlODBC or pgJDBC documentation would you feel comfortable 
just referencing it from protocol.sgml? I would not and, in my opinion, 
libpq's being the reference client implementation should not change 
that.


On top of that, in the libpq documentation the termination line is only 
mentioned in a section titled "Obsolete Functions for COPY" which makes 
it even less likely that someone working on a different implementation 
of the protocol will notice it.


[strong opinion - I would object to leaving the only description in the 
libpq documentation]


> ... why do we want to duplicate the same information ...

The change to the CopyData message documentation does refer back to the 
full description. My intent with the brief description of the \. line 
was to include enough information so that the reader could decide 
whether or not skipping back to the full description would be useful. I 
think that usefulness outweighs the minor duplication.


[moderate opinion - I plan to leave it as is unless others weigh in in 
favor of just keeping the reference]


But given that I don't work on libpq or even use it, I'm not comfortable 
changing the documentation of 4 different libpq methods (even obsolete 
methods) on my own initiative. If the committer who picks this up wants 
the libpq documentation changed as part of this, that would be different 
and I'd be willing to give it a shot.


[no strong feelings one way or the other - I would leave the libpq 
documentation as is but could easily be swayed]


> ... duplicate the same information in different words at three 
different places ...


I count 7 different places. In the protocol docs, there is the old 
mention in the "Summary of Changes since Protocol 2.0" section and the 
two new mentions in the protocol definitions plus after reading through 
libpq-copy.html again, I think all 4 of these sections refer to the 
terminating line/end-of-copy-data marker.


PQgetline - "... the application must check to see if a new line 
consists of the two characters \., which indicates ..."
PQgetlineAsync - "... returns -1 if the end-of-copy-data marker has 
been recognized ..."
PQputline - "... Note ...send the two characters \. as a final line 
..."
PQendcopy - "... followed by PQendcopy after the terminator line is 
seen ..."


[informational - lists references to remove when a future protocol drops 
the \. line entirely]





Re: Implementing SQL ASSERTION

2018-09-24 Thread Joe Wildish
Hi Peter,

> On 24 Sep 2018, at 15:06, Peter Eisentraut
>  wrote:
> 
> On 29/04/2018 20:18, Joe Wildish wrote:
>> 
>> Attached is a rebased patch for the prototype.
> 
> I took a look at this.

Thank you for reviewing.

> This has been lying around for a few months, so it will need to be
> rebased again.
> 
> 8< - - - snipped for brevity - - - 8<
> 
> All this new code in constraint.c that checks the assertion expression
> needs more comments and documentation.

All agreed.  I’ll give the patch some TLC and get a new version that
addresses the above.

> Stuff like this isn't going to work:
> 
> static int
> funcMaskForFuncOid(Oid funcOid)
> {
>char *name = get_func_name(funcOid);
> 
>if (name == NULL)
>return OTHER_FUNC;
>else if (strncmp(name, "min", strlen("min")) == 0)
>return MIN_AGG_FUNC;
>else if (strncmp(name, "max", strlen("max")) == 0)
>return MAX_AGG_FUNC;
> 
> You can assume from the name of a function what it's going to do.
> Solving this properly might be hard.

Agreed. My assumption was that we would record in the data dictionary the
behaviour (or “polarity") of each aggregate function with respect to the
various operators. Column in pg_aggregate? I don’t know how we’d record it
exactly. A bitmask would be a possibility. Also, I don’t know what we’d do
with custom aggregate functions (or indeed custom operators). Allowing end
users to determine the value would potentially lead to assertion checks
being incorrectly skipped. Maybe we’d say that custom aggregates always
have a neutral polarity and are therefore not subject to this
optimisation.

> This ought to be reproducible for you if you build with assertions.

Yes. I shall correct this when I do the aforementioned rebase and
application of TLC.

> My feeling is that if we want to move forward on this topic, we need to
> solve the concurrency question first.  All these optimizations for when
> we don't need to check the assertion are cool, but they are just
> optimizations that we can apply later on, once we have solved the
> critical problems.

I obviously agree that the concurrency issue needs solving. But I don’t
see that at all as a separate matter from the algos. Far from being merely
optimisations, the research indicates we can go a lot further toward
reducing the need for rechecks and, therefore, reducing the chance of
concurrency conflicts from occurring in the first place. This is true
regardless of whatever mechanism we use to enforce correct behaviour under
concurrent modifications -- e.g. a lock on the ASSERTION object itself,
enforced use of SERIALIZABLE, etc.

By way of example (lifted directly from the AM4DP book):

CREATE TABLE employee (
  id INTEGER PRIMARY KEY,
  dept INTEGER NOT NULL,
  job TEXT NOT NULL
);

CREATE ASSERTION department_managers_need_administrators CHECK
  (NOT EXISTS
(SELECT dept
   FROM employee a
  WHERE EXISTS (SELECT * FROM employee b
 WHERE a.dept = b.dept
   AND b.job IN ('Manager', 'Senior Manager'))
AND NOT EXISTS (SELECT * FROM employee b
 WHERE a.dept = b.dept
   AND b.job = 'Administrator')));

The current implementation derives "DELETE(employee), INSERT(employee) and
UPDATE(employee.dept, employee.job)" as the set of invalidating operations
and triggers accordingly. However, in this case, we can supplement the
triggers by having them inspect the transition tables to see if the actual
data from the triggering DML statement could in fact affect the truth of
the expression: specifically, only do the recheck on DELETE of an
"Administrator", INSERT of a "Manager" or "Senior Manager", or UPDATE when
the new job is a "Manager" or "Senior Manager" or the old job was an
"Administrator".

Now, if this is a company with 10,000 employees, and would therefore
presumably only require a handful of managers, right? ;-), then the
potential for a concurrency conflict is massively reduced when compared to
rechecking every time the employee table is touched.

(This optimisation has some caveats and is reliant upon being able to
derive the key of an expression from the underlying base tables plus some
stuff about functional dependencies. I have started work on it but sadly
not had time to progress it in recent months).

Having said all that: there are obviously going to be some expressions
that cannot be proven to have no potential for invalidating the assertion
truth. I guess this is the prime concern from a concurrency PoV? Example:

CREATE TABLE t (
  b BOOLEAN NOT NULL,
  n INTEGER NOT NULL,
  PRIMARY KEY (b, n)
);

CREATE ASSERTION sum_per_b_less_than_10 CHECK
  (NOT EXISTS
(SELECT FROM (SELECT b, SUM(n)
FROM t
   GROUP BY b) AS v(b, sum_n)
  WHERE sum_n > 10));

Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". I guess the
interesting case, from a concurrency perspective, is how do we avoid an
INSERT WHERE 

Re: Calculate total_table_pages after set_base_rel_sizes()

2018-09-24 Thread Edmund Horner
David Rowley said:
> I believe that we should be delaying the PlannerInfo's
> total_table_pages calculation until after constraint exclusion and
> partition pruning have taken place. Doing this calculation before we
> determine which relations we don't need to scan can lead to
> incorrectly applying random_page_cost to too many pages processed
> during an Index Scan.
> 
> We already don't count relations removed by join removals from this
> calculation, so counting pruned partitions seems like an omission.
> 
> The attached patch moves the calculation to after set_base_rel_sizes()
> is called and before set_base_rel_pathlists() is called, where the
> information is actually used.
> 
> I am considering this a bug fix, but I'm proposing this for PG12 only
> as I don't think destabilising plans in the back branches is a good
> idea. I'll add this to the September commitfest.

Hi David, I had a quick look at this.  (I haven't tested it so this isn't a 
full review.)

It looks like a fairly straightforward code move.  And I think it's correct to 
exclude the pages from partitions that won't be read.

I have a small tweak.  In make_one_rel, we currently have:

/*
 * Compute size estimates and consider_parallel flags for each base rel,
 * then generate access paths.
 */
set_base_rel_sizes(root);
set_base_rel_pathlists(root);

Your patch inserts code between the two lines.  I think the comment should be 
split too.

/* Compute size estimates and consider_parallel flags for each base rel. */
set_base_rel_sizes(root);

// NEW CODE

/* Generate access paths. */
set_base_rel_pathlists(root);

Cheers,
Edmund

Re: auto_explain: Include JIT output if applicable

2018-09-24 Thread Lukas Fittl
On Mon, Sep 24, 2018 at 1:48 PM, Andres Freund  wrote:
>
> Thanks for noticing - pushed!
>

Thanks!

Best,
Lukas

-- 
Lukas Fittl


Re: PATCH: Update snowball stemmers

2018-09-24 Thread Tom Lane
Arthur Zakirov  writes:
> Ah, I see. I attached new version made with --no-renames. Will wait for
> what cfbot will say.

I reviewed and pushed this.

As a cross-check on the patch, I cloned the Snowball github repo
and built the derived files in it.  I noticed that they'd incorporated
several new stemmers since 2007 --- not only your Nepali one, but
half a dozen more besides.  Since the point here is (IMO) mostly to
follow their lead on what's interesting, I went ahead and added those
as well.

In short, therefore, the commit includes the Nepali stuff from your
other thread as well as what was in this one.

Although I added nepali.stop from the other patch, I've not done
anything about updating our other stopword lists.  Presumably those
are a bit obsolete by now as well.  I wonder if we can prevail on
the Snowball people to make those available in some less painful way
than scraping them off assorted web pages.  Ideally they'd stick them
into their git repo ...

regards, tom lane



Re: Collation versioning

2018-09-24 Thread Peter Geoghegan
On Mon, Sep 24, 2018 at 1:47 PM Thomas Munro
 wrote:
> Personally I'm not planning to work on multi-version installation any
> time soon, I was just scoping out some basic facts about all this.  I
> think the primary problem that affects most of our users is the
> shifting-under-your-feet problem, which we now see applies equally to
> libc and libicu.

Are we sure about that? Could it just be that ICU will fix bugs that
cause their strcoll()-alike and strxfrm()-alike functions to give
behavior that isn't consistent with the behavior required by the CLDR
version in use?

This seems like it might be a very useful distinction. We know that
glibc had bugs that were caused by strxfrm() not agreeing with
strcoll() -- that was behind the 9.5-era abbreviated keys issues. But
that was actually a bug in an optimization in strcoll(), rather than a
strxfrm() bug. strxfrm() gave the correct answer, which is to say the
answer that was right according to the high level collation
definition. It merely failed to be bug-compatible with strcoll().
What's ICU supposed to do about an issue like that?

If we're going to continue to rely on the strxfrm() equivalent from
ICU, then it seems to me that ICU should be able to change behaviors
in a stable release, provided the behavior they're changing is down to
a bug in their infrastructure, as opposed to an organic evolution in
how some locale sorts text (CLDR update). My understanding is that ICU
is designed to decouple technical issues with issues of concern to
natural language experts, so we as an ICU client can limit ourselves
to worrying about one of the two at any given time.

-- 
Peter Geoghegan



Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2018-09-24 Thread Alvaro Herrera
On 2018-Sep-24, Sergei Kornilov wrote:

> Hi
> 
> > An autovacuum can't be just aggressive; it's either anti-wraparound or 
> > normal.
> But autovacuum _can_ be aggressive and not anti-wraparound.
> I build current master and can see 3 different line types:
> 2018-09-24 23:47:31.500 MSK 27939 @ from  [vxid:4/272032 txid:0] [] LOG:  
> automatic aggressive vacuum of table "postgres.public.foo": index scans: 0
> 2018-09-24 23:49:27.892 MSK 28333 @ from  [vxid:4/284297 txid:0] [] LOG:  
> automatic aggressive vacuum to prevent wraparound of table 
> "postgres.public.foo": index scans: 0
> 2018-09-24 23:49:29.093 MSK 28337 @ from  [vxid:4/284412 txid:0] [] LOG:  
> automatic vacuum of table "postgres.public.foo": index scans: 0

Exactly.

It cannot be anti-wraparound and not aggressive, which is the line type
not shown.

"Aggressive" means it scans all pages; "anti-wraparound" means it does
not let itself be cancelled because of another process waiting for a
lock on the table.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Include application_name in "connection authorized" log message

2018-09-24 Thread Stephen Frost
Greetings Don!

* Don Seiler (d...@seiler.us) wrote:
> On Tue, Aug 7, 2018 at 12:32 PM Tom Lane  wrote:
> > Don Seiler  writes:
> >
> > > 1. We want to make a generic, central ascii-lobotomizing function similar
> > > to check_application_name that we can re-use there and for other checks
> > (eg
> > > user name).
> > > 2. Change check_application_name to call this function (or just call this
> > > function instead of check_application_name()?)
> >
> > check_application_name's API is dictated by the GUC check-hook interface,
> > and doesn't really make sense for this other use.  So the first part of
> > that, not the second.
> >
> > > 3. Call this function when storing the value in the port struct.
> >
> > I'm not sure where exactly is the most sensible place to call it,
> > but trying to minimize the number of places that know about this
> > kluge seems like a good principle.
> 
> OK I created a new function called clean_ascii() in common/string.c. I call
> this from my new logic in postmaster.c as well as replacing the logic in
> guc.c's check_application_name() and check_cluster_name().

Since we're putting it into common/string.c (which seems pretty
reasonable to me, at least), I went ahead and changed it to be
'pg_clean_ascii'.  I didn't see any other obvious cases where we could
use this function (though typecmds.c does have an interesting ASCII
check for type categories..).

Otherwise, I added some comments, added application_name to the
replication 'connection authorized' messages (seems like we really
should be consistent across all of them...), ran it through pgindent,
and updated a variable name or two here and there.

> I've been fighting my own confusion with git and rebasing and fighting the
> same conflicts over and over and over, but this patch should be what I
> want. If anyone has time to review my git process, I would appreciate it. I
> must be doing something wrong to have these same conflicts every time I
> rebase (or I completely misunderstand what it actually does).

I'd be happy to chat about it sometime, of course, just have to find
time when we both have a free moment. :)

Attached is the updated patch.  If you get a chance to look over it
again and make sure it looks good to you, that'd be great.  I did a bit
of testing of it myself but wouldn't complain if someone else wanted to
also.

One thing I noticed while testing is that our 'disconnection' message
still emits 'database=' for replication connections even though the
'connection authorized' message doesn't (presumably because we realized
it's a bit silly to do so when we say 'replication connection'...).
Seems like it'd be nice to have the log_connection / log_disconnection
messages have some consistency about them but that's really a different
discussion from this.

Thanks!

Stephen
From 7151ee89e1663a762f928f33ad4023e70257ef9e Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 24 Sep 2018 15:59:50 -0400
Subject: [PATCH] Add application_name to connection authorized msg

The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so let's
add that as it can be very useful.

Note that at the point where we're emitting the connection authorized
message, we haven't processed GUCs, so it's not possible to get this by
using log_line_prefix (which pulls from the GUC).  There's also
something to be said for having this included in the connection
authorized message and then not needing to repeat it for every line, as
having it in log_line_prefix would do.

The GUC cleans the application name to pure-ascii, so do that here too,
but pull out the logic for cleaning up a string into its own function
in common and re-use it from those places, and check_cluster_name which
was doing the same thing.

Author: Don Seiler 
Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com
---
 src/backend/postmaster/postmaster.c | 16 +
 src/backend/utils/init/postinit.c   | 54 -
 src/backend/utils/misc/guc.c| 17 ++---
 src/common/string.c | 19 ++
 src/include/common/string.h |  1 +
 src/include/libpq/libpq-be.h|  7 
 6 files changed, 84 insertions(+), 30 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 305ff36258..41de140ae0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -99,6 +99,7 @@
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
 #include "common/ip.h"
+#include "common/string.h"
 #include "lib/ilist.h"
 #include "libpq/auth.h"
 #include "libpq/libpq.h"
@@ -2096,6 +2097,21 @@ retry1:
 			pstrdup(nameptr));
 port->guc_options = lappend(port->guc_options,
 			pstrdup(valptr));
+
+/*
+ * Copy application_name to port if we come across it.  This
+ * is done so we can log the 

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2018-09-24 Thread Sergei Kornilov
Hi

> An autovacuum can't be just aggressive; it's either anti-wraparound or normal.
But autovacuum _can_ be aggressive and not anti-wraparound.
I build current master and can see 3 different line types:
2018-09-24 23:47:31.500 MSK 27939 @ from  [vxid:4/272032 txid:0] [] LOG:  
automatic aggressive vacuum of table "postgres.public.foo": index scans: 0
2018-09-24 23:49:27.892 MSK 28333 @ from  [vxid:4/284297 txid:0] [] LOG:  
automatic aggressive vacuum to prevent wraparound of table 
"postgres.public.foo": index scans: 0
2018-09-24 23:49:29.093 MSK 28337 @ from  [vxid:4/284412 txid:0] [] LOG:  
automatic vacuum of table "postgres.public.foo": index scans: 0

regards, Sergei



Re: auto_explain: Include JIT output if applicable

2018-09-24 Thread Andres Freund
Hi,

On 2018-09-24 11:34:38 -0700, Lukas Fittl wrote:
> Hi,
> 
> Whilst playing around with auto_explain and JIT today, I realized that
> auto_explain currently doesn't output JIT information, which is rather
> unfortunate when analyzing a larger set of queries in a semi-automated
> manner.
> 
> Attached a trivial patch that fixes the issue and adds JIT information to
> auto_explain with the same logic as used for regular EXPLAIN.

Thanks for noticing - pushed!

It's pretty annoying that so much of this code is duplicated in
auto_explain. It'd be good if we refactored explain.c so that there's
less duplication. But that seems like it'd not be v11 work, so...

- Andres



Re: Collation versioning

2018-09-24 Thread Thomas Munro
On Tue, Sep 25, 2018 at 4:26 AM Douglas Doole  wrote:>
> On Sun, Sep 23, 2018 at 2:48 PM Thomas Munro  
> wrote:
>> Admittedly that creates a whole can
>> of worms for initdb-time catalog creation, package maintainers' jobs,
>> how long old versions have to be supported and how you upgraded
>> database objects to new ICU versions.
>
>
> Yep. We never come up with a good answer for that before I left IBM. At the 
> time, DB2 only supported 2 or 3 version of ICU, so they were all shipped as 
> part of the install bundle.
>
> Long term, I think the only viable approach to supporting multiple versions 
> of ICU is runtime loading of the libraries. Then it's up to the system 
> administrator to make sure the necessary versions are installed on the system.

I wonder if we would be practically constrained to using the
distro-supplied ICU (by their policies of not allowing packages to
ship their own copies ICU); it seems like it.  I wonder which distros
allow multiple versions of ICU to be installed.  I see that Debian 9.5
only has 57 in the default repo, but the major version is in the
package name (what is the proper term for that kind of versioning?)
and it doesn't declare a conflict with other versions, so that's
promising.  Poking around with nm I noticed also that both the RHEL
and Debian ICU libraries have explicitly versioned symbol names like
"ucol_strcollUTF8_57", which is also promising.  FreeBSD seems to have
used "--disable-renaming" and therefore defines only
"ucol_strcollUTF8"; doh.

This topic is discussed here:
http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-ICU-as-an-Operating-System-Level-Library

Personally I'm not planning to work on multi-version installation any
time soon, I was just scoping out some basic facts about all this.  I
think the primary problem that affects most of our users is the
shifting-under-your-feet problem, which we now see applies equally to
libc and libicu.

>> Yeah, it seems like ICU is *also* subject to minor changes that happen
>> under your feet, much like libc.  For example maintenance release 60.2
>> (you can't install that at the same time as 60.1, but you can install
>> it at the same time as 59.2).  You'd be linked against libicu.so.60
>> (and thence libicudata.so.60), and it gets upgraded in place when you
>> run the local equivalent of apt-get upgrade.
>
> This always worried me because an unexpected collation change is so painful 
> for a database. And I was never able to think of a way of reliably testing 
> compatibility either because of ICU's ability to reorder and group characters 
> when collating.

I think the best we can do is to track versions per dependency (ie
record it when the CHECK is created, when the index is created or
rebuilt, ...) and generate loud warnings until you've dealt with each
version dependency.  That's why I've suggested we could consider
sticking it on pg_depend (though I have apparently failed to convince
Stephen so far).  I think something like that is better than the
current collversion design, which punts the problem to the DBA: "hey,
human, there might be some problems, but I don't know where!  Please
tell me when you've fixed them by running ALTER COLLATION ... REFRESH
VERSION!" instead of having the computer track of what actually needs
to be done on an object-by-object basis and update the versions
one-by-one automatically when the problems are resolved.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: fast default vs triggers

2018-09-24 Thread Andrew Dunstan




On 09/20/2018 09:04 AM, Tomas Vondra wrote:



On 09/19/2018 10:35 PM, Andrew Dunstan wrote:



On 09/18/2018 03:36 PM, Andrew Dunstan wrote:



Tomas Vondra has pointed out to me that there's an issue with 
triggers not getting expanded tuples for columns with fast defaults. 
Here is an example that shows the issue:



   andrew=# create table blurfl (id int);
   CREATE TABLE
   andrew=# insert into blurfl select x from generate_series(1,5) x;
   INSERT 0 5
   andrew=# alter table blurfl add column x int default 100;
   ALTER TABLE
   andrew=# create or replace function showmej() returns trigger
   language plpgsql as $$ declare j json; begin j := to_json(old);
   raise notice 'old x: %', j->>'x'; return new; end; $$;
   CREATE FUNCTION
   andrew=# create trigger show_x before update on blurfl for each row
   execute procedure showmej();
   CREATE TRIGGER
   andrew=# update blurfl set id = id where id = 1;
   NOTICE:  old x: 
   UPDATE 1
   andrew=# update blurfl set id = id where id = 1;
   NOTICE:  old x: 100
   UPDATE 1
   andrew=#


The error is fixed with this patch:


   diff --git a/src/backend/commands/trigger.c 
b/src/backend/commands/trigger.c

   index 2436692..f34a72a 100644
   --- a/src/backend/commands/trigger.c
   +++ b/src/backend/commands/trigger.c
   @@ -3396,7 +3396,11 @@ ltrmark:;
    LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
    }
    -   result = heap_copytuple();
   +   if (HeapTupleHeaderGetNatts(tuple.t_data) < 
relation->rd_att->natts)

   +   result = heap_expand_tuple(, relation->rd_att);
   +   else
   +   result = heap_copytuple();
   +
    ReleaseBuffer(buffer);
     return result;

I'm going to re-check the various places that this might have been 
missed. I guess it belongs on the open items list.








I haven't found anything further that is misbehaving. I paid 
particular attention to indexes and index-only scans.


I propose to commit this along with an appropriate regression test.



Seems reasonable to me.




This exposed a further issue with nulls in certain positions. A fix for 
both issues, and some regression tests, has been committed.


Thanks to Tomas for his help.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2018-09-24 Thread Nasby, Jim

> On Sep 24, 2018, at 1:29 PM, Andres Freund  wrote:
> 
> I'm very doubtful this is an improvement. Especially with the upcoming
> pluggable storage work making vacuumlazy.c heap specific, while vacuum.c
> stays generic.  The concept of something like
> PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much
> (even if criteria for it might).

That’s already a problem since vacuum logging is spread all over while autovac 
logging is not. Perhaps there needs to be some sort of vacuum_log() function 
that immediately provides output for manual vacuums, but aggregates output for 
autovac. AFAIK that’s the only real reason for autovac logging being a special 
case today.

Re: Participate in GCI as a Mentor

2018-09-24 Thread Tahir Ramzan
Thanks Sarah,

I have received the emails and invitation. Now, I am reviewing all stuff
and will respond with my ideas within a day.

Regards
Tahir

On Tue, Sep 25, 2018 at 12:34 AM Sarah Conway Schnurr <
xenophene...@gmail.com> wrote:

> Tahir,
>
> I've extended an invitation through GCI for you to become a mentor for the
> PostgreSQL
> organization, and have sent an additional email containing information on
> how to get started
> with contributing to this year's contest. We have one month left to get
> over a hundred tasks
> uploaded, so that is our current priority!
>
> The PostgreSQL wiki link has the current list of tasks available as well
> as a full list of other
> mentors in the program for this year, found here:
> https://wiki.postgresql.org/wiki/GCI_2018
>
> Thank you very much for your interest - we're thrilled to have you be a
> part of GCI 2018!
>
> Sarah
>
> On Sat, Sep 22, 2018 at 12:39 PM Tahir Ramzan <
> tahirram...@alumni.vu.edu.pk> wrote:
>
>> Honorable Concern,
>>
>> I want to join GCI as a mentor, please guide me about the procedure,
>> thanks in anticipation.
>>
>> --
>> Regards
>> Tahir Ramzan
>> MSCS Research Scholar
>> Google Summer of Code 2015 (CiviCRM)
>> Google Summer of Code 2016 (ModSecurity)
>> Outside Collaborator of SpiderLabs (Powered by TrustWave)
>> Google Android Students Club Facilitator and Organizer 2015
>>
>> Contact:
>>
>> +92-312-5518018 <+92%20312%205518018>
>>
>> tahirram...@alumni.vu.edu.pk
>>
>>
>> More details about me and my work:
>>
>> GitHub Profile: https://github.com/tahirramzan
>>
>> LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan
>>
>
>
> --
> Sarah Conway Schnurr
>


-- 
Regards
Tahir Ramzan
MSCS Research Scholar
Google Summer of Code 2015 (CiviCRM)
Google Summer of Code 2016 (ModSecurity)
Outside Collaborator of SpiderLabs (Powered by TrustWave)
Google Android Students Club Facilitator and Organizer 2015

Contact:

+92-312-5518018 <+92%20312%205518018>

tahirram...@alumni.vu.edu.pk


More details about me and my work:

GitHub Profile: https://github.com/tahirramzan

LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan


Re: Missing const in DSA.

2018-09-24 Thread Thomas Munro
On Tue, Sep 25, 2018 at 4:17 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Mon, Sep 24, 2018 at 9:32 AM Tom Lane  wrote:
> >> Mark G  writes:
> >>> While looking at some of the recent churn in DSA I noticed that
> >>> dsa_size_class_map should probably be declared const.
>
> >> +1 ... also, given the contents of the array, "char" seems like
> >> rather a misnomer.  I'd be happier if it were declared as uint8, say.
>
> > +1
>
> Are you planning to take care of this?

Will do.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Participate in GCI as a Mentor

2018-09-24 Thread Sarah Conway Schnurr
Tahir,

I've extended an invitation through GCI for you to become a mentor for the
PostgreSQL
organization, and have sent an additional email containing information on
how to get started
with contributing to this year's contest. We have one month left to get
over a hundred tasks
uploaded, so that is our current priority!

The PostgreSQL wiki link has the current list of tasks available as well as
a full list of other
mentors in the program for this year, found here:
https://wiki.postgresql.org/wiki/GCI_2018

Thank you very much for your interest - we're thrilled to have you be a
part of GCI 2018!

Sarah

On Sat, Sep 22, 2018 at 12:39 PM Tahir Ramzan 
wrote:

> Honorable Concern,
>
> I want to join GCI as a mentor, please guide me about the procedure,
> thanks in anticipation.
>
> --
> Regards
> Tahir Ramzan
> MSCS Research Scholar
> Google Summer of Code 2015 (CiviCRM)
> Google Summer of Code 2016 (ModSecurity)
> Outside Collaborator of SpiderLabs (Powered by TrustWave)
> Google Android Students Club Facilitator and Organizer 2015
>
> Contact:
>
> +92-312-5518018 <+92%20312%205518018>
>
> tahirram...@alumni.vu.edu.pk
>
>
> More details about me and my work:
>
> GitHub Profile: https://github.com/tahirramzan
>
> LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan
>


-- 
Sarah Conway Schnurr


auto_explain: Include JIT output if applicable

2018-09-24 Thread Lukas Fittl
Hi,

Whilst playing around with auto_explain and JIT today, I realized that
auto_explain currently doesn't output JIT information, which is rather
unfortunate when analyzing a larger set of queries in a semi-automated
manner.

Attached a trivial patch that fixes the issue and adds JIT information to
auto_explain with the same logic as used for regular EXPLAIN.

Thanks,
Lukas

-- 
Lukas Fittl


auto_explain-include-jit-output-v1.patch
Description: Binary data


Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2018-09-24 Thread Andres Freund
On 2018-09-24 18:25:46 +, Nasby, Jim wrote:
> 
> > On Sep 21, 2018, at 12:43 PM, Andres Freund  wrote:
> > 
> >> But as far i can see it is possible have aggressive non-wraparound vacuum. 
> >> One important difference - regular and aggressive regular can be canceled 
> >> by backend,.wraparound autovacuum can not. (by checking 
> >> PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c )
> > 
> > Yes, without checking the code, they should be different. Aggressive is
> > controlled by vacuum_freeze_table_age whereas anti-wrap is controlled by
> > autovacuum_freeze_max_age (but also implies aggressive).
> 
> Right, except that by the time you get into the vacuum code itself nothing 
> should really care about that difference. AFAICT, the only thing 
> is_wraparound is being used for is to set MyPgXact->vacuumFlags |= 
> PROC_VACUUM_FOR_WRAPAROUND, which prevents the deadlock detector from killing 
> an autovac process that’s trying to prevent a wraparound. I think it’d be 
> clearer to remove is_wraparound and move the check from vacuum_rel() into 
> lazy_vacuum_rel() (which is where the limits for HeapTupleSatisfiesVacuum get 
> determined). Something like the attached.

I'm very doubtful this is an improvement. Especially with the upcoming
pluggable storage work making vacuumlazy.c heap specific, while vacuum.c
stays generic.  The concept of something like
PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much
(even if criteria for it might).

Greetings,

Andres Freund



Re: Query is over 2x slower with jit=on

2018-09-24 Thread Andres Freund
Hi,

On 2018-09-19 20:39:22 -0700, Andres Freund wrote:
> On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > That's going in the right direction.  Personally I'd make the last line
> > more like
> >
> > Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, 
> > emission 14.607 ms, total 43.4 ms
>
> Yea, that's probably easier to read.

I'm wondering about upper-casing the individual times (and options) -
we're largely upper-casing properties, and for json/xml output each
would still be a property. Seems a tad bit more consistent.  I now have:

FORMAT text:
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.298 ms, Inlining 2.250 ms, Optimization 5.797 ms, 
Emission 5.246 ms, Total 13.591 ms

FORMAT xml:
 
   2
   
 true
 true
 true
 true
   
   
 0.651
 2.260
 14.752
 7.764
 25.427
   
 

FORMAT json:
 "JIT": {
   "Functions": 2,
   "Options": {
 "Inlining": true,
 "Optimization": true,
 "Expressions": true,
 "Deforming": true
   },
   "Timing": {
 "Generation": 0.238,
 "Inlining": 0.807,
 "Optimization": 4.661,
 "Emission": 4.236,
 "Total": 9.942
   }
 },

>
> > (total at the end seems more natural to me, YMMV).
>
> I kind of think doing it first is best, because that's usually the first
> thing one wants to know.
>
>
> > Also, the "options" format you suggest here seems a bit too biased
> > towards binary on/off options --- what happens when there's a
> > three-way option?  So maybe that line should be like
> >
> > Options: inlining on, optimization on
> >
> > though I'm less sure about that part.

Now that space is less of a concern, I added expressions, and deforming
as additional options - seems reasonable to have all PGJIT_* options
imo.

Btw, I chose true/false rather than on/off, to be consistent with
ExplainPropertyBool - but I've no strong feelings about it.

Greetings,

Andres Freund



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-09-24 Thread Peter Geoghegan
On Wed, Sep 19, 2018 at 11:23 AM Peter Geoghegan  wrote:
> 3 modes
> ---
>
> My new approach is to teach _bt_findsplitloc() 3 distinct modes of
> operation: Regular/default mode, many duplicates mode, and single
> value mode.

I think that I'll have to add a fourth mode, since I came up with
another strategy that is really effective though totally complementary
to the other 3 -- "multiple insertion point" mode. Credit goes to
Kevin Grittner for pointing out that this technique exists about 2
years ago [1]. The general idea is to pick a split point just after
the insertion point of the new item (the incoming tuple that prompted
a page split) when it looks like there are localized monotonically
increasing ranges.  This is like a rightmost 90:10 page split, except
the insertion point is not at the rightmost page on the level -- it's
rightmost within some local grouping of values.

This makes the two largest TPC-C indexes *much* smaller. Previously,
they were shrunk by a little over 5% by using the new generic
strategy, a win that now seems like small potatoes. With this new
mode, TPC-C's order_line primary key, which is the largest index of
all, is ~45% smaller following a standard initial bulk load at
scalefactor 50. It shrinks from 99,085 blocks (774.10 MiB) to 55,020
blocks (429.84 MiB). It's actually slightly smaller than it would be
after a fresh REINDEX with the new strategy. We see almost as big a
win with the second largest TPC-C index, the stock table's primary key
-- it's ~40% smaller.

Here is the definition of the biggest index, the order line primary key index:

pg@tpcc[3666]=# \d order_line_pkey
 Index "public.order_line_pkey"
  Column   │  Type   │ Key? │ Definition
───┼─┼──┼
 ol_w_id   │ integer │ yes  │ ol_w_id
 ol_d_id   │ integer │ yes  │ ol_d_id
 ol_o_id   │ integer │ yes  │ ol_o_id
 ol_number │ integer │ yes  │ ol_number
primary key, btree, for table "public.order_line"

The new strategy/mode works very well because we see monotonically
increasing inserts on ol_number (an order's item number), but those
are grouped by order. It's kind of an adversarial case for our
existing implementation, and yet it seems like it's probably a fairly
common scenario in the real world.

Obviously these are very significant improvements. They really exceed
my initial expectations for the patch. TPC-C is generally considered
to be by far the most influential database benchmark of all time, and
this is something that we need to pay more attention to. My sense is
that the TPC-C benchmark is deliberately designed to almost require
that the system under test have this "multiple insertion point" B-Tree
optimization, suffix truncation, etc. This is exactly the same index
that we've seen reports of out of control bloat on when people run
TPC-C over hours or days [2].

My next task is to find heuristics to make the new page split
mode/strategy kick in when it's likely to help, but not kick in when
it isn't (when we want something close to a generic 50:50 page split).
These heuristics should look similar to what I've already done to get
cases with lots of duplicates to behave sensibly. Anyone have any
ideas on how to do this? I might end up inferring a "multiple
insertion point" case from the fact that there are multiple
pass-by-value attributes for the index, with the new/incoming tuple
having distinct-to-immediate-left-tuple attribute values for the last
column, but not the first few. It also occurs to me to consider the
fragmentation of the page as a guide, though I'm less sure about that.
I'll probably need to experiment with a variety of datasets before I
settle on something that looks good. Forcing the new strategy without
considering any of this actually works surprisingly well on cases
where you'd think it wouldn't, since a 50:50 page split is already
something of a guess about where future insertions will end up.

[1] 
https://postgr.es/m/CACjxUsN5fV0kV=yirxwa0s7lqoojuy7soptipdhucemhgwo...@mail.gmail.com
[2] https://www.commandprompt.com/blog/postgres_autovacuum_bloat_tpc-c/
-- 
Peter Geoghegan



Re: Allowing printf("%m") only where it actually works

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 12, 2018 at 01:40:01PM -0400, Tom Lane wrote:
>> Rebase attached --- no substantive changes.

> -   if (handleDLL == NULL)
> -   ereport(FATAL,
> -   (errmsg_internal("could not load netmsg.dll: error
> -code %lu", GetLastError(;

> In 0001, this is replaced by a non-FATAL error for the backend, which
> does not seem like a good idea to me because the user loses visibility
> with this DDL which canot be loaded.  I still have to see this error...

Well, we have to change the code somehow to make it usable in frontend
as well as backend.  And we can *not* have it do exit(1) in libpq.
So the solution I chose was to make it act the same as if FormatMessage
were to fail.  I don't find this behavior unreasonable: what is really
important is the original error code, not whether we were able to
pretty-print it.  I think the ereport(FATAL) coding is a pretty darn
bad idea even in the backend.

> Could you drop the configure checks for snprintf and vsnprintf in a
> separate patch?  The cleanup of %m and the removal of those switches
> should be treated separatly in my opinion.

Seems a bit make-worky, but here you go.  0001 is the same as before
(but rebased up to today, so some line numbers change).  0002
changes things so that we always use our snprintf, removing all the
configure logic associated with that.  0003 implements %m in snprintf.c
and adjusts our various printf-wrapper functions to ensure that they
pass errno through reliably.  0004 changes elog.c to rely on %m being
implemented below it.

regards, tom lane

diff --git a/configure b/configure
index 9b30402..afbc142 100755
*** a/configure
--- b/configure
*** esac
*** 15635,15653 
  
  fi
  
- ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror"
- if test "x$ac_cv_func_strerror" = xyes; then :
-   $as_echo "#define HAVE_STRERROR 1" >>confdefs.h
- 
- else
-   case " $LIBOBJS " in
-   *" strerror.$ac_objext "* ) ;;
-   *) LIBOBJS="$LIBOBJS strerror.$ac_objext"
-  ;;
- esac
- 
- fi
- 
  ac_fn_c_check_func "$LINENO" "strlcat" "ac_cv_func_strlcat"
  if test "x$ac_cv_func_strlcat" = xyes; then :
$as_echo "#define HAVE_STRLCAT 1" >>confdefs.h
--- 15635,15640 
diff --git a/configure.in b/configure.in
index 2e60a89..6973b77 100644
*** a/configure.in
--- b/configure.in
*** else
*** 1678,1684 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy strnlen])
  
  case $host_os in
  
--- 1678,1684 
AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break])
  fi
  
! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strlcat strlcpy strnlen])
  
  case $host_os in
  
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index f4356fe..af35cfb 100644
*** a/src/backend/port/win32/socket.c
--- b/src/backend/port/win32/socket.c
*** pgwin32_select(int nfds, fd_set *readfds
*** 690,728 
  		memcpy(writefds, , sizeof(fd_set));
  	return nummatches;
  }
- 
- 
- /*
-  * Return win32 error string, since strerror can't
-  * handle winsock codes
-  */
- static char wserrbuf[256];
- const char *
- pgwin32_socket_strerror(int err)
- {
- 	static HANDLE handleDLL = INVALID_HANDLE_VALUE;
- 
- 	if (handleDLL == INVALID_HANDLE_VALUE)
- 	{
- 		handleDLL = LoadLibraryEx("netmsg.dll", NULL, DONT_RESOLVE_DLL_REFERENCES | LOAD_LIBRARY_AS_DATAFILE);
- 		if (handleDLL == NULL)
- 			ereport(FATAL,
- 	(errmsg_internal("could not load netmsg.dll: error code %lu", GetLastError(;
- 	}
- 
- 	ZeroMemory(, sizeof(wserrbuf));
- 	if (FormatMessage(FORMAT_MESSAGE_IGNORE_INSERTS |
- 	  FORMAT_MESSAGE_FROM_SYSTEM |
- 	  FORMAT_MESSAGE_FROM_HMODULE,
- 	  handleDLL,
- 	  err,
- 	  MAKELANGID(LANG_ENGLISH, SUBLANG_DEFAULT),
- 	  wserrbuf,
- 	  sizeof(wserrbuf) - 1,
- 	  NULL) == 0)
- 	{
- 		/* Failed to get id */
- 		sprintf(wserrbuf, "unrecognized winsock error %d", err);
- 	}
- 	return wserrbuf;
- }
--- 690,692 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 16531f7..22e5d87 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*** static void send_message_to_server_log(E
*** 178,185 
  static void write_pipe_chunks(char *data, int len, int dest);
  static void send_message_to_frontend(ErrorData *edata);
  static char *expand_fmt_string(const char *fmt, ErrorData *edata);
- static const char *useful_strerror(int errnum);
- static const char *get_errno_symbol(int errnum);
  static const char *error_severity(int elevel);
  static void append_with_tabs(StringInfo buf, const char *str);
  static bool is_log_level_output(int elevel, int log_min_level);
--- 178,183 
*** expand_fmt_string(const char 

Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Andres Freund
On 2018-09-24 11:45:10 +0200, Chris Travers wrote:
> I did some more reading.
> 
> On Mon, Sep 24, 2018 at 10:15 AM Chris Travers 
> wrote:
> 
> > First, thanks for taking the time to write this.  Its very helpful.
> > Additional thoughts inline.
> >
> > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier 
> > wrote:
> >
> >>
> >> There could be value in refactoring things so as all the *Pending flags
> >> of miscadmin.h get stored into one single volatile sig_atomic_t which
> >> uses bit-wise markers, as that's at least 4 bytes because that's stored
> >> as an int for most platforms and can be performed as an atomic operation
> >> safely across signals (If my memory is right;) ).  And this leaves a lot
> >> of room for future flags.
> >>
> >
> > Yeah I will look into this.
> >
> 
> 
> Ok so having looked into this a bit more
> 
> It looks like to be strictly conforming, you can't just use a series of
> flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
> round-trip safe in signal handlers.  In other words, if you read, are
> pre-empted by another signal, and then write, you may clobber what the
> other signal handler did to the variable.  So you need atomics, which are
> C11
> 
> What I would suggest instead at least for an initial approach is:
> 
> 1.  A struct of volatile bools statically stored
> 2.  macros for accessing/setting/clearing flags
> 3.  Consistent use of these macros throughout the codebase.
> 
> To make your solution work it looks like we'd need C11 atomics which would
> be nice and maybe at some point we decide to allow newer feature, or we
> could wrap this itself in checks for C11 features and provide atomic flags
> in the future.  It seems that the above solution would strictly comply to
> C89 and pose no concurrency issues.

It's certainly *NOT* ok to use atomics in signal handlers
indiscriminately, on some hardware / configure option combinations
they're backed by spinlocks (or even semaphores) and thus *NOT*
interrupt save.

This doesn't seem to solve an actual problem, why are we discussing
changing this? What'd be measurably improved, worth the cost of making
backpatching more painful?

Greetings,

Andres Freund



Re: Collation versioning

2018-09-24 Thread Douglas Doole
It's been a bunch of years since I worked with ICU, so anything I say below
may have changed in their code or be subject to mental bit rot.

On Sun, Sep 23, 2018 at 2:48 PM Thomas Munro 
wrote:

> Considering that to handle this we'd need to figure out
> how link libicu.so.55, libicu.so.56, ... etc into the same backend,
> and yet they presumably have the same collation names,


There's an option when compiling ICU to version extend the API names. So,
instead of calling ucol_open(), you'd call ucol_open_55() or ucol_open_56()
then you can link to both libixu.so.55 and libicu.so.56 without getting
name collisions.

The side effect of this is that it's the application's responsibility to
figure out which version of ICU "en_US" should be routed to. In DB2, we
started the collation names with UCAxxx (later changed to CLDRxxx) to let
us distinguish which version of the API to call.

Admittedly that creates a whole can
> of worms for initdb-time catalog creation, package maintainers' jobs,
> how long old versions have to be supported and how you upgraded
> database objects to new ICU versions.


Yep. We never come up with a good answer for that before I left IBM. At the
time, DB2 only supported 2 or 3 version of ICU, so they were all shipped as
part of the install bundle.

Long term, I think the only viable approach to supporting multiple versions
of ICU is runtime loading of the libraries. Then it's up to the system
administrator to make sure the necessary versions are installed on the
system.

Yeah, it seems like ICU is *also* subject to minor changes that happen
> under your feet, much like libc.  For example maintenance release 60.2
> (you can't install that at the same time as 60.1, but you can install
> it at the same time as 59.2).  You'd be linked against libicu.so.60
> (and thence libicudata.so.60), and it gets upgraded in place when you
> run the local equivalent of apt-get upgrade.
>

This always worried me because an unexpected collation change is so painful
for a database. And I was never able to think of a way of reliably testing
compatibility either because of ICU's ability to reorder and group
characters when collating.


Re: Missing const in DSA.

2018-09-24 Thread Tom Lane
Thomas Munro  writes:
> On Mon, Sep 24, 2018 at 9:32 AM Tom Lane  wrote:
>> Mark G  writes:
>>> While looking at some of the recent churn in DSA I noticed that
>>> dsa_size_class_map should probably be declared const.

>> +1 ... also, given the contents of the array, "char" seems like
>> rather a misnomer.  I'd be happier if it were declared as uint8, say.

> +1

Are you planning to take care of this?

regards, tom lane



Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role

2018-09-24 Thread Tom Lane
Michael Paquier  writes:
> This should be back-patched.  Any opinions about bumping up this
> extension version in back-branches like what has been done in 53b79ab4?

Yes, you need to bump the extension version to change anything in the
extension's script file.

For v10 and up, the method used in 53b79ab4 is overcomplicated: you only
need to add a delta script not a new base script.  (If you had to
back-patch before v10, it might be best to add a new base script in all
the branches just to keep the patches consistent; but IIUC this issue only
arises in v10 and up.)  I'd consider following, eg, 7f563c09f as a
prototype instead.

regards, tom lane



Re: Segfault when creating partition with a primary key and sql_drop trigger exists

2018-09-24 Thread Alvaro Herrera
On 2018-Sep-20, Marco Slot wrote:

> We're seeing a segmentation fault when creating a partition of a
> partitioned table with a primary key when there is a sql_drop trigger on
> Postgres 11beta4.
> 
> We discovered it because the Citus extension creates a sql_drop trigger,
> but it's otherwise unrelated to the Citus extension:
> https://github.com/citusdata/citus/issues/2390

Thanks for the reproducer.  Will research.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Multiple primary key on partition table?

2018-09-24 Thread Alvaro Herrera
[Back from a week AFK ...]

On 2018-Sep-18, amul sul wrote:

> On Mon, Sep 17, 2018 at 9:06 PM amul sul  wrote:
> >
> > Nice catch Rajkumar.

> Here is the complete patch proposes the aforesaid fix with regression test.

Looks closely related to the one that was fixed in 1f8a3327a9db, but of
course it's a different code path.  Will review this shortly, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-09-24 Thread Michael Paquier
On Mon, Sep 24, 2018 at 12:19:44PM +1000, Haribabu Kommi wrote:
> Attached new rebased version of the patch that enhances the
> pg_stat_statements_reset()
> function. This needs to be applied on top of the patch that is posted in
> [1].

+CREATE ROLE stats_regress_user1;
+CREATE ROLE stats_regress_user2;
Just a short note: regression tests creating roles should use regress_
as prefix.
--
Michael


signature.asc
Description: PGP signature


Re: [patch]overallocate memory for curly braces in array_out

2018-09-24 Thread Tom Lane
Keiichi Hirobe  writes:
> Attached is a patch that fixes a bug
> for miscounting total number of curly braces in output string in array_out.

Wow, good catch!

Testing this, I found there's a second way in which the space calculation
is off: it always allocated one more byte than required, as a result of
counting one more comma than is really required.  That's not nearly as
significant as the curly-brace miscount, but it still got in the way of
doing this:

*** 1234,1239 
--- 1243,1251 
  #undef APPENDSTR
  #undef APPENDCHAR
  
+   /* Assert that we calculated the string length accurately */
+   Assert(overall_length == (p - retval + 1));
+ 
pfree(values);
pfree(needquotes);
  

which seemed to me like a good idea now that we know this code isn't
so perfect as all that.

Will push shortly.

regards, tom lane



Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/24/2018 10:09 AM, Joe Conway wrote:
> On 09/24/2018 10:01 AM, Tom Lane wrote:
>> Joe Conway  writes:
>>> Having seen none, committed/pushed. This did not seem worth
>>> back-patching, so I only pushed to master.
>> 
>> I don't see anything on gitmaster?
> 
> Hmm, yes, interesting -- I must of messed up my local git repo somehow.
> Will try again.

This time it seems to have worked. Sorry for the noise earlier :-/

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/24/2018 10:01 AM, Tom Lane wrote:
> Joe Conway  writes:
>> Having seen none, committed/pushed. This did not seem worth
>> back-patching, so I only pushed to master.
> 
> I don't see anything on gitmaster?

Hmm, yes, interesting -- I must of messed up my local git repo somehow.
Will try again.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Implementing SQL ASSERTION

2018-09-24 Thread Peter Eisentraut
On 29/04/2018 20:18, Joe Wildish wrote:
> On 28 Mar 2018, at 16:13, David Fetter  wrote:
>>
>> Sorry to bother you again, but this now doesn't compile atop master.
> 
> Attached is a rebased patch for the prototype.

I took a look at this.

This has been lying around for a few months, so it will need to be
rebased again.  I applied this patch on top of
68e7e973d22274a089ce95200b3782f514f6d2f8, which was the HEAD around the
time this patch was created, and it applies cleanly there.

Please check you patch for whitespace errors:

warning: squelched 13 whitespace errors
warning: 18 lines add whitespace errors.

Also, reduce the amount of useless whitespace changes in the patch.

There are some compiler warnings:

constraint.c: In function 'CreateAssertion':
constraint.c:1211:2: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]

constraint.c: In function 'oppositeDmlOp':
constraint.c:458:1: error: control reaches end of non-void function
[-Werror=return-type]

The version check in psql's describeAssertions() needs to be updated.
Also, you should use formatPGVersionNumber() to cope with two-part and
one-part version numbers.

All this new code in constraint.c that checks the assertion expression
needs more comments and documentation.

Stuff like this isn't going to work:

static int
funcMaskForFuncOid(Oid funcOid)
{
char *name = get_func_name(funcOid);

if (name == NULL)
return OTHER_FUNC;
else if (strncmp(name, "min", strlen("min")) == 0)
return MIN_AGG_FUNC;
else if (strncmp(name, "max", strlen("max")) == 0)
return MAX_AGG_FUNC;

You can assume from the name of a function what it's going to do.
Solving this properly might be hard.

The regression test crashes for me around

frame #4: 0x00010d3a4cdc postgres`castNodeImpl(type=T_SubLink,
ptr=0x7ff27006d230) at nodes.h:582
frame #5: 0x00010d3a61c6
postgres`visitSubLink(node=0x7ff270034040, info=0x7ffee2a23930)
at constraint.c:843

This ought to be reproducible for you if you build with assertions.


My feeling is that if we want to move forward on this topic, we need to
solve the concurrency question first.  All these optimizations for when
we don't need to check the assertion are cool, but they are just
optimizations that we can apply later on, once we have solved the
critical problems.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal for disk quota feature

2018-09-24 Thread Hubert Zhang
>
> The quotas or object limits, resource limits are pretty useful and
> necessary, but I don't see these like new type of objects, it is much more
> some property of current objects. Because we have one syntax for this
> purpose I prefer it. Because is not good to have two syntaxes for similar
> purpose.

SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a
special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace
both residents in one database. But ROLE is cluster-level. They don't
belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the
quota for the user on all the databases in the first glance. But in our
first stage design, ROLE's quota is bind to a specific database. E.g. Role
Jack could have 10GB quota on database A and 2GB quota on database B.

SQL syntax is not hard to modify,  I don't think this should block the main
design of disk quota feature. Is there any comment on the design and
architecture? If no, we'll firstly submit our patch and involve more
discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule 
wrote:

>
>
> so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang  napsal:
>
>> But it looks like redundant to current GUC configuration and limits
>>
>> what do you mean by current GUC configuration? Is that the general block
>> number limit in your patch? If yes, the difference between GUC and
>> pg_diskquota catalog is that pg_diskquota will store different quota limit
>> for the different role, schema or table instead of a single GUC value.
>>
>
> storage is not relevant in this moment.
>
> I don't see to consistent to sets some limits via SET command, or ALTER X
> SET, and some other with CREATE QUOTA ON.
>
> The quotas or object limits, resource limits are pretty useful and
> necessary, but I don't see these like new type of objects, it is much more
> some property of current objects. Because we have one syntax for this
> purpose I prefer it. Because is not good to have two syntaxes for similar
> purpose.
>
> So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer
>
> ALTER SCHEMA xxx SET disc_quota = xxx;
>
> The functionality is +/- same. But ALTER XX SET was introduce first, and I
> don't feel comfortable to have any new syntax for similar purpose
>
> Regards
>
> Pavel
>
>
>
>
>
>>
>> On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang 
>>> napsal:
>>>
 just fast reaction - why QUOTA object?
> Isn't ALTER SET enough?
> Some like
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
> New DDL commans looks like too hard hammer .


 It's an option. Prefer to consider quota setting store together:
 CREATE DISK QUOTA way is more nature to store quota setting in a
 separate pg_diskquota catalog
 While ALTER SET way is more close to store quota setting in pg_class,
 pg_role, pg_namespace. etc in an integrated way.
 (Note that here I mean nature/close is not must, ALTER SET could also
 store in pg_diskquota and vice versa.)

>>>
>>> I have not a problem with new special table for storing this
>>> information. But it looks like redundant to current GUC configuration and
>>> limits. Can be messy do some work with ALTER ROLE, and some work via CREATE
>>> QUOTE.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
 Here are some differences I can think of:
 1 pg_role is a global catalog, not per database level. It's harder to
 tracker the user's disk usage in the whole clusters(considering 1000+
 databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
 only tracks the user's disk usage inside the current database.
 2 using separate pg_diskquota could add more field except for quota
 limit without adding too many fields in pg_class, e.g. red zone to give the
 user a warning or the current disk usage of the db objects.

 On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
 wrote:

>
>
> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
> napsal:
>
>>
>>
>>
>>
>> *Hi all,We redesign disk quota feature based on the comments from
>> Pavel Stehule and Chapman Flack. Here are the new 
>> design.OverviewBasically,
>>  disk quota feature is used to support multi-tenancy environment, 
>> different
>> level of database objects could be set a quota limit to avoid over use of
>> disk space. A common case could be as follows: DBA could enable disk 
>> quota
>> on a specified database list. DBA could set disk quota limit for
>> tables/schemas/roles in these databases. Separate disk quota worker 
>> process
>> will monitor the disk usage for these objects and detect the objects 
>> which
>> exceed their quota limit. Queries loading data into these “out of disk
>> quota” tables/schemas/roles will be cancelled.We are 

Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/21/2018 01:51 PM, Joe Conway wrote:
> On 09/19/2018 11:18 AM, Joe Conway wrote:
>> On 09/19/2018 10:54 AM, Tom Lane wrote:
>>> So maybe what we really need is a table of operators not functions.
>> 
>> Good idea -- I will take a look at that.
>> 
>>> However, I don't object to documenting any function that has its
>>> own pg_description string.
> 
> Ok, so the attached version refactors/splits the group into two tables
> -- operators and functions.



> I also included John Naylor's patch with some minor editorialization.
> 
> Any further comments or complaints?

Having seen none, committed/pushed. This did not seem worth
back-patching, so I only pushed to master.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
Very minor revision
On Mon, Sep 24, 2018 at 11:45 AM Chris Travers 
wrote:

>
> Ok so having looked into this a bit more
>
> It looks like to be strictly conforming, you can't just use a series of
> flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
> round-trip safe in signal handlers.  In other words, if you read, are
> pre-empted by another signal, and then write, you may clobber what the
> other signal handler did to the variable.  So you need atomics, which are
> C11
>
> What I would suggest instead at least for an initial approach is:
>
> 1.  A struct of volatile bools statically stored
>

These would be implemented as sig_atomic_t which is defined in C89 but has
no atomic operators other than writing the full value.


> 2.  macros for accessing/setting/clearing flags
> 3.  Consistent use of these macros throughout the codebase.
>
> To make your solution work it looks like we'd need C11 atomics which would
> be nice and maybe at some point we decide to allow newer feature, or we
> could wrap this itself in checks for C11 features and provide atomic flags
> in the future.  It seems that the above solution would strictly comply to
> C89 and pose no concurrency issues.
>
> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Make deparsing of column defaults faster

2018-09-24 Thread Peter Eisentraut
On 30/07/2018 13:51, Jeff Janes wrote:
> Any thoughts on how to proceed here?  It seems there is more work to do
> to cover all the issues with dumping and restoring tables with many
> columns.  Since the original report was in the context of pg_upgrade, we
> should surely address at least the pg_restore slowness.
> 
> I'll working on solving the problem using a hash table at the lowest
> level (making column names unique), for a future commit fest.  That
> should drop it from N^3 to N^2, which since N can't go above 1600 should
> be good enough.
> 
> So we can set this to rejected, as that will be an entirely different
> approach.  
> 
> Your caching patch might be worthwhile on its own, though.

I'm going to set this thread as returned with feedback until we have a
more complete solution.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Hint to set owner for tablespace directory

2018-09-24 Thread Peter Eisentraut
On 11/09/2018 17:10, Peter Eisentraut wrote:
> On 07/09/2018 17:59, Maksim Milyutin wrote:
>> those directories was that user). The error message "could not set 
>> permissions on directory ..." disoriented that user. The need to change 
>> the owner of those directories came after careful reading of 
>> documentation. I think it would be helpful to show the proposed hint to 
>> more operatively resolve the problem.
> 
> I think it might be worth clarifying the documentation instead.  I'm
> looking at the CREATE TABLESPACE reference page and it's not super clear
> on first reading.

How about the attached patch?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eb0796921b6ebfe2375037d903a081a7b8f45c0b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 24 Sep 2018 14:47:09 +0200
Subject: [PATCH] doc: Clarify CREATE TABLESPACE documentation

Be more specific about when and how to create the directory and what
permissions it should have.
---
 doc/src/sgml/ref/create_tablespace.sgml | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/create_tablespace.sgml 
b/doc/src/sgml/ref/create_tablespace.sgml
index 18fa5f0ebf..c621ec2c6b 100644
--- a/doc/src/sgml/ref/create_tablespace.sgml
+++ b/doc/src/sgml/ref/create_tablespace.sgml
@@ -92,7 +92,8 @@ Parameters
   

 The directory that will be used for the tablespace. The directory
-should be empty and must be owned by the
+must exist (CREATE TABLESPACE will not create it),
+should be empty, and must be owned by the
 PostgreSQL system user.  The directory must 
be
 specified by an absolute path name.

@@ -137,15 +138,23 @@ Notes
   Examples
 
   
-   Create a tablespace dbspace at 
/data/dbs:
+   To create a tablespace dbspace at file system location
+   /data/dbs, first create the directory using operating
+   system facilities and set the correct ownership:
+
+mkdir /data/dbs
+chown postgres:postgres /data/dbs
+
+   Then issue the tablespace creation command inside
+   PostgreSQL:
 
 CREATE TABLESPACE dbspace LOCATION '/data/dbs';
 
   
 
   
-   Create a tablespace indexspace at 
/data/indexes
-   owned by user genevieve:
+   To create a tablespace owned by a different database user, use a command
+   like this:
 
 CREATE TABLESPACE indexspace OWNER genevieve LOCATION '/data/indexes';
 
-- 
2.19.0



Re: Segfault when creating partition with a primary key and sql_drop trigger exists

2018-09-24 Thread Justin Pryzby
On Thu, Sep 20, 2018 at 12:00:18PM +0200, Marco Slot wrote:
> We're seeing a segmentation fault when creating a partition of a
> partitioned table with a primary key when there is a sql_drop trigger on
> Postgres 11beta4.

Thanks for reporting ; I reproduced easily so added to open items list, since
indices on partitioned talbes is a feature new in PG11.

Core was generated by `postgres: pryzbyj ts [local] CREATE TABLE '.
Program terminated with signal 11, Segmentation fault.
#0  0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at 
event_trigger.c:1745
1745event_trigger.c: No such file or directory.
in event_trigger.c

(gdb) bt
#0  0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at 
event_trigger.c:1745
#1  0x005dfbd3 in AlterTableInternal (relid=40108800, cmds=0x21c39a8, 
recurse=true) at tablecmds.c:3328
#2  0x005b5b7b in DefineIndex (relationId=40108800, stmt=0x21a7350, 
indexRelationId=0, parentIndexId=40084714, parentConstraintId=40084715,
is_alter_table=false, check_rights=false, check_not_in_use=false, 
skip_build=false, quiet=false) at indexcmds.c:669
#3  0x005dcfee in DefineRelation (stmt=0x2116690, relkind=114 'r', 
ownerId=17609, typaddress=0x0,
queryString=0x20f1e60 "CREATE TABLE collections_list_1\nPARTITION OF 
collections_list (key, ts, collection_id, value)\nFOR VALUES IN (1);") at 
tablecmds.c:946
[...]

Justin



Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
I did some more reading.

On Mon, Sep 24, 2018 at 10:15 AM Chris Travers 
wrote:

> First, thanks for taking the time to write this.  Its very helpful.
> Additional thoughts inline.
>
> On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier 
> wrote:
>
>>
>> There could be value in refactoring things so as all the *Pending flags
>> of miscadmin.h get stored into one single volatile sig_atomic_t which
>> uses bit-wise markers, as that's at least 4 bytes because that's stored
>> as an int for most platforms and can be performed as an atomic operation
>> safely across signals (If my memory is right;) ).  And this leaves a lot
>> of room for future flags.
>>
>
> Yeah I will look into this.
>


Ok so having looked into this a bit more

It looks like to be strictly conforming, you can't just use a series of
flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
round-trip safe in signal handlers.  In other words, if you read, are
pre-empted by another signal, and then write, you may clobber what the
other signal handler did to the variable.  So you need atomics, which are
C11

What I would suggest instead at least for an initial approach is:

1.  A struct of volatile bools statically stored
2.  macros for accessing/setting/clearing flags
3.  Consistent use of these macros throughout the codebase.

To make your solution work it looks like we'd need C11 atomics which would
be nice and maybe at some point we decide to allow newer feature, or we
could wrap this itself in checks for C11 features and provide atomic flags
in the future.  It seems that the above solution would strictly comply to
C89 and pose no concurrency issues.

-- 
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Pluggable Storage - Andres's take

2018-09-24 Thread Alexander Korotkov
On Mon, Sep 24, 2018 at 8:04 AM Haribabu Kommi  wrote:
> On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov 
>  wrote:
>>
>> On Fri, Aug 24, 2018 at 5:50 AM Andres Freund  wrote:
>> > I've pushed a current version of that to my git tree to the
>> > pluggable-storage branch. It's not really a version that I think makese
>> > sense to review or such, but it's probably more useful if you work based
>> > on that.  There's also the pluggable-zheap branch, which I found
>> > extremely useful to develop against.
>>
>> BTW, I'm going to take a look at current shape of this patch and share
>> my thoughts.  But where are the branches you're referring?  On your
>> postgres.org git repository pluggable-storage brach was updates last
>> time at June 7.  And on the github branches are updated at August 5
>> and 14, and that is still much older than your email (August 24)...
>
>
> The code is latest, but the commit time is older, I feel that is because of
> commit squash.
>
> pluggable-storage is the branch where the pluggable storage code is present
> and pluggable-zheap branch where zheap is rebased on top of pluggable
> storage.

Got it, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Global snapshots

2018-09-24 Thread Andrey Borodin
Hi!

I want to review this patch set. Though I understand that it probably will be 
quite long process.

I like the idea that with this patch set universally all postgres instances are 
bound into single distributed DB, even if they never heard about each other 
before :) This is just amazing. Or do I get something wrong?

I've got few questions:
1. If we coordinate HA-clusters with replicas, can replicas participate if 
their part of transaction is read-only?
2. How does InDoubt transaction behave when we add or subtract leap seconds?

Also, I could not understand some notes from Arseny:

> 25 июля 2018 г., в 16:35, Arseny Sher  написал(а):
> 
> * One drawback of these patches is that only REPEATABLE READ is
>   supported. For READ COMMITTED, we must export every new snapshot
>   generated on coordinator to all nodes, which is fairly easy to
>   do. SERIALIZABLE will definitely require chattering between nodes,
>   but that's much less demanded isolevel (e.g. we still don't support
>   it on replicas).

If all shards are executing transaction in SERIALIZABLE, what anomalies does it 
permit?

If you have transactions on server A and server B, there are transactions 1 and 
2, transaction A1 is serialized before A2, but B1 is after B2, right?

Maybe we can somehow abort 1 or 2? 

> 
> * Another somewhat serious issue is that there is a risk of recency
>   guarantee violation. If client starts transaction at node with
>   lagging clocks, its snapshot might not include some recently
>   committed transactions; if client works with different nodes, she
>   might not even see her own changes. CockroachDB describes at [1] how
>   they and Google Spanner overcome this problem. In short, both set
>   hard limit on maximum allowed clock skew.  Spanner uses atomic
>   clocks, so this skew is small and they just wait it at the end of
>   each transaction before acknowledging the client. In CockroachDB, if
>   tuple is not visible but we are unsure whether it is truly invisible
>   or it's just the skew (the difference between snapshot and tuple's
>   csn is less than the skew), transaction is restarted with advanced
>   snapshot. This process is not infinite because the upper border
>   (initial snapshot + max skew) stays the same; this is correct as we
>   just want to ensure that our xact sees all the committed ones before
>   it started. We can implement the same thing.
I think that this situation is also covered in Clock-SI since transactions will 
not exit InDoubt state before we can see them. But I'm not sure, chances are 
that I get something wrong, I'll think more about it. I'd be happy to hear 
comments from Stas about this.
> 
> 
> * 003_bank_shared.pl test is removed. In current shape (loading one
>   node) it is useless, and if we bombard both nodes, deadlock surely
>   appears. In general, global snaphots are not needed for such
>   multimaster-like setup -- either there are no conflicts and we are
>   fine, or there is a conflict, in which case we get a deadlock.
Can we do something with this deadlock? Will placing an upper limit on time of 
InDoubt state fix the issue? I understand that aborting automatically is kind 
of dangerous...

Also, currently hanging 2pc transaction can cause a lot of headache for DBA. 
Can we have some kind of protection for the case when one node is gone 
permanently during transaction?

Thanks!

Best regards, Andrey Borodin.


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
First, thanks for taking the time to write this.  Its very helpful.
Additional thoughts inline.

On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier  wrote:

> On Fri, Sep 21, 2018 at 12:35:46PM +0200, Chris Travers wrote:
> > I understand how lock levels don't fit a simple hierarchy but at least
> > when it comes to what is going to be aborted on a signal, I am having
> > trouble understanding the problem here.
>
> It may be possible to come with a clear hierarchy with the current
> interruption types in place.  Still I am not sure that the definition
> you put behind is completely correct, and I think that we need to
> question as well the value of putting such restrictions for future
> interruption types because they would need to fit into it.


The future-safety issue is a really good one and it's one reason I kept the
infinite loop patch as semantically consistent with the API as I could at
the cost of some complexity.

I have another area where I think a patch would be more valuable anyway in
terms of refactoring.


>   That's quite
> a heavy constraint to live with.  There is such logic with wal_level for
> example, which is something I am not completely happy with either...
> But this one is a story for another time, and another thread.
>

 From a cleanup perspective a concentric circles approach seems like it is
correct to me (which would correspond to a hierarchy of interrupts) but I
can see that assuming that all pending interrupts would be checked solely
for cleanup reasons might be a bad assumption on my part.

>
> Regarding your patch, it seems to me that it does not improve
> readability as I mentioned up-thread because you lose sight of what can
> be interrupted in a given code path, which is what the current code
> shows actually nicely.
>

So I guess there are two fundamental questions here.

1.  Do we want to move away from checking global flags like this directly?
I think we do because it makes future changes possibly harder and more
complex since there is no encapsulation of logic.  But I don't see a point
in putting effort into that without consensus.

>
> There could be value in refactoring things so as all the *Pending flags
> of miscadmin.h get stored into one single volatile sig_atomic_t which
> uses bit-wise markers, as that's at least 4 bytes because that's stored
> as an int for most platforms and can be performed as an atomic operation
> safely across signals (If my memory is right;) ).  And this leaves a lot
> of room for future flags.
>

Yeah I will look into this.

Thanks again for taking the time to go over the concerns in detail.  It
really helps.

Best Wishes,
Chris Travers

> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Something fishy happening on frogmouth

2018-09-24 Thread Noah Misch
On Wed, Sep 19, 2018 at 01:03:44PM +0900, Kyotaro HORIGUCHI wrote:
> Thank you for finding and fixing this.
> 
> At Sat, 15 Sep 2018 18:21:52 -0400, Tom Lane  wrote in 
> <1.1537050...@sss.pgh.pa.us>
> > Noah Misch  writes:
> > > Usually, the first srandom() call happens early in PostmasterMain().  I 
> > > plan
> > > to add one to InitStandaloneProcess(), which substitutes for several tasks
> > > otherwise done in PostmasterMain().  That seems like a good thing even if 
> > > DSM
> > > weren't in the picture.  Also, initdb needs an srandom() somewhere;
> > > choose_dsm_implementation() itself seems fine.  Attached.
> > 
> > +1, but some comments would be good.

> +1, too.

Thanks for reviewing.  I pushed with some comments.