Re: Proposal for Signal Detection Refactoring
On Tue, Sep 25, 2018 at 3:03 AM Tom Lane wrote: > Michael Paquier writes: > > And then within separate signal handlers things like: > > void > > StatementCancelHandler(SIGNAL_ARGS) > > { > > [...] > > signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY; > > [...] > > } > > AFAICS this still wouldn't work. The machine code is still going to > look (on many machines) like "load from signalPendingFlags, > OR in some bits, store to signalPendingFlags". So there's still a > window for another signal handler to interrupt that and store some > bits that will get lost. > > You could only fix that by blocking all signal handling during the > handler, which would be expensive and rather pointless. > > I do not think that it's readily possible to improve on the current > situation with one sig_atomic_t per flag. > After a fair bit of reading I think there are ways of doing this in C11 but I don't think those are portable to C99. In C99 (and, in practice C89, as the C99 committee noted there were no known C89 implementations where reading was unsafe), reading or writing a static sig_atomic_t inside a signal handler is safe, but a round-trip is *not* guaranteed not to clobber. While I could be wrong, I think it is only in C11 that you have any round-trip operations which are guaranteed not to clobber in the language itself. Basically we are a long way out to be able to consider these a single value as flags. However, what I think one could do is use a struct of volatile sig_atomic_t members and macros for checking/setting. Simply writing a value is safe in C89 and higher. > regards, tom lane > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: Strange failure in LWLock on skink in REL9_5_STABLE
On Sat, Sep 22, 2018 at 4:52 PM Amit Kapila wrote: > On Sat, Sep 22, 2018 at 2:28 AM Andres Freund wrote: > > On 2018-09-22 08:54:57 +1200, Thomas Munro wrote: > > > On Fri, Sep 21, 2018 at 4:43 PM Tom Lane wrote: > > > > Unless it looks practical to support this behavior in the Windows > > > > and SysV cases, I think we should get rid of it rather than expend > > > > effort on supporting it for just some platforms. > > > > > > We can remove it in back-branches without breaking API compatibility: > > > > > > 1. Change dsm_impl_can_resize() to return false unconditionally (I > > > suppose client code is supposed to check this before using > > > dsm_resize(), though I'm not sure why it has an "impl" in its name if > > > it's part of the public interface of this module). > > > 2. Change dsm_resize() and dsm_remap() to raise an error conditionally. > > > 3. Rip out the DSM_OP_RESIZE cases from various places. > > > > > > Then in master, remove all of those functions completely. However, > > > I'd feel like a bit of a vandal. Robert and Amit probably had plans > > > for that code...? > > > > Robert, Amit: ^ > > I went through and check the original proposal [1] to see if any use > case is mentioned there, but nothing related has been discussed. I > couldn't think of much use of this facility except maybe for something > like parallelizing correalated sub-queries where the size of outer var > can change across executions and we might need to resize the initially > allocated memory. This is just a wild thought, I don't have any > concrete idea about this. Having said that, I don't object to > removing this especially because the implementation doesn't seem to be > complete. In future, if someone needs such a facility, they can first > develop a complete version of this API. Thanks for looking into that. Here's a pair of draft patches to disable and then remove dsm_resize() and dsm_map(). -- Thomas Munro http://www.enterprisedb.com 0001-Desupport-dsm_resize-and-dsm_remap.patch Description: Binary data 0002-Remove-dsm_resize-and-dsm_remap.patch Description: Binary data
Re: GetSnapshotData round two(for me)
On Tue, Sep 25, 2018 at 11:00 AM, Daniel Wood wrote: > I was about to suggest creating a single shared snapshot instead of having > multiple backends compute what is essentially the same snapshot. Luckily, > before posting, I discovered Avoiding repeated snapshot computation from > Pavan and POC: Cache data in GetSnapshotData() from Andres. > I think Mithun has also worked on this[1] and posted some analysis about which cases he has seen improvement and also the cases where it did not perform well, you might want to have a look. [1] https://www.postgresql.org/message-id/CAD__OujRZEjE5y3vfmmZmSSr3oYGZSHRxwDwF7kyhBHB2BpW_g%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
GetSnapshotData round two(for me)
I was about to suggest creating a single shared snapshot instead of having multiple backends compute what is essentially the same snapshot. Luckily, before posting, I discovered Avoiding repeated snapshot computation https://www.postgresql.org/message-id/caboikdmsj4osxta7xbv2quhkyuo_4105fjf4n+uyroybazs...@mail.gmail.com from Pavan and POC: Cache data in GetSnapshotData() https://www.postgresql.org/message-id/20150202152706.gd9...@alap3.anarazel.de from Andres. Andres, could I get a short summary of the biggest drawback that may have prevented this from being released? Before I saw this I had did my own implementation and saw some promising results(25% on 48 cores). I do need to do some mixed RO and RW workloads to see how the invalidations of the shared copy, at EOT time, affect the results. There are some differences in my implementation. I choose, perhaps incorrectly?, to busy spin other users trying to get a snapshot while the first guy in builds the shared copy. My thinking is to not increase latency of using the snapshot. The improvement of the idea doesn't come from getting off the CPU, by using a WAIT, but in not reading PGXACT cache lines on all the cpus acquiring the snapshot that are constantly being dirtied. One backend can do the heavy lifting and the others can immediately jump on the shared copy once created. And something else quite weird: As I was evolving a standard setup for benchmark runs and getting baselines I was getting horrible numbers sometimes(680K) and other times I'd get over 1 million QPS. I was thinking I had a bad machine. What I found was that even though I was running a fixed 192 clients I had set max_connections to 600 sometimes and 1000 on other runs. Here is what I see running select-only scale 1000 pgbench with 192 clients on a 48 core box(2 sockets) using different values for max_connections: 200 tps = 1092043 250 tps = 1149490 300 tps = 732080 350 tps = 719611 400 tps = 681170 450 tps = 687527 500 tps = 859978 550 tps = 927161 600 tps = 1092283 650 tps = 1154916 700 tps = 1237271 750 tps = 1195968 800 tps = 1162221 850 tps = 1140626 900 tps = 749519 950 tps = 648398 1000 tps = 653460 This is on the base 12x codeline. The only thought I've had so far is that each PGXACT in use(192) is being scattered across the full set of max_connections, instead of being physically contiguous in the first 192 slots. This would cause more cache lines to be scanned. It doesn't make a lot of sense given that it goes up back again from 500 peaking at 700. Also this is after a fresh restart so the proc's in the freelist shouldn't have been scrambled yet in terms of ordering. NOTE: I believe you'll only see this huge difference on a dual socket machine. It'd probably only take 30 minutes or so on a big machine to confirm with a couple of few minute runs at different values for max_connections. I'll be debugging this soon. But I've been postponing it while experimenting with my shared snapshot code.
Re: SSL tests failing with "ee key too small" error on Debian SID
On Tue, Sep 25, 2018 at 12:48:57PM +0900, Kyotaro HORIGUCHI wrote: > Do you mean that cert/key files are generated on-the-fly while > running 'make check'? It sounds reasonable as long as just > replaceing existing files with those with longer (2048bits?) keys > doesn't work for all supported platforms. The files are present by default in the tree, but can be regenerated easily by using the makefile rule "sslfiles". From what I can see, this is caused by OpenSSL 1.1.1 which Debian SID has visibly upgraded to recently. That's the version I have on my system. I have not dug much into the Makefile to see if things could get done right and change the openssl commands though.. -- Michael signature.asc Description: PGP signature
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Tue, Sep 25, 2018 at 01:49:09PM +1000, Haribabu Kommi wrote: > Thanks for the review. > Fixed in the attached patch as per your suggestion. Hmm. I see a problem with the tests and the stability of what pg_stat_statements_reset() can return. Normally installcheck is disabled in contrib/pg_stat_statements/Makefile but if you remove this barrier and run the tests with a server loading the module in shared_preload_libraries then things are not stable. We don't have this kind of instability on HEAD. Some call to pg_stat_statements_reset() system-wide is visibly missing. + if (!pgss || !pgss_hash) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("pg_stat_statements must be loaded via shared_preload_libraries"))); This check can be within entry_reset(). + the specified userid, dbid and queryid. Returns the total number of + statement statistics that are reset based on the specified input. + If any of the parameter is not specified, the default value NULL(invalid) Missing some markups for the three field names here, as well as for NULL which is a value. I can buy the compatibility breakage with the return result of pg_stat_statements_reset when specified without arguments. Some nannyism: If all entries are removed and a new file needs to be written, you could save a bit of indentation by returning immediately when (num_entries != num_remove). -- Michael signature.asc Description: PGP signature
PG vs macOS Mojave
Well, macOS 10.14 (Mojave) is out, so I installed it on a spare machine, and naturally the first thing I tried was to build PG with it. Our core code seems fine, but: * --with-perl fails in configure, complaining that it can't find perl.h. * --with-tcl fails in configure, complaining that it can't find tclConfig.sh. Furthermore, the historical workaround for that (--with-tclconfig=/System/Library/Frameworks/Tcl.framework) doesn't fix it. After some investigation, it seems that Apple has been busy moving header files (not libraries) under SDK-specific "sysroot" directories, with the expectation that you'd compile using "-isysroot $SYSROOT". There's some mention of that here, for example: https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/cross_development/Configuring/configuring.html#//apple_ref/doc/uid/1163i-CH1-SW7 The sysroot seems to contain only headers; stuff you need at runtime, such as shared libraries, is still where it used to be. The recommended way to get the appropriate sysroot path seems to be SYSROOT=`xcodebuild -version -sdk macosx Path` Attached is a draft patch to fix things up. The core ideas are (1) Stop assuming that the Perl headers and library are necessarily in the same place; create a perl_includedir variable to represent the path to the former. (2) Tweak src/template/darwin to inject the appropriate -isysroot option into CPPFLAGS. (3) Remove the need to manually specify the path to tclConfig.sh, which has gotten even more painful than before because now it's somewhere under the sysroot. You can still specify --with-tclconfig if you really want to, but it's not necessary anymore to build pltcl under recent macOS. Note that (3) alone is not sufficient to fix pltcl; we must do (2) as well because tclConfig.sh now reports the Tcl include flags as TCL_INCLUDE_SPEC= -iwithsysroot /System/Library/Frameworks/Tcl.framework/Versions/8.5/Headers so unless we also set -isysroot this doesn't work. It's a bit scary to be adding -isysroot globally. I thought briefly about using it only while building pltcl, but that seems even more dangerous: if there were any discrepancies between the headers in the sysroot and those in the normal include directories, building pltcl with different headers from the rest of the system would surely be disastrous. In any case, I suspect that the handwriting is on the wall, and before very much longer it's going to be impossible to build meaningful code on macOS without -isysroot anyway. I've tested this on all the macOS versions I have at hand, and it doesn't seem to break anything. Only part (1) could possibly affect other platforms, and that seems safe enough. I'd like to commit and backpatch this, because otherwise longfin is going to start falling over when I upgrade its host to Mojave. Thoughts? regards, tom lane diff --git a/configure b/configure index 21ecd29..879eee4 100755 --- a/configure +++ b/configure @@ -668,6 +668,7 @@ python_majorversion PYTHON perl_embed_ldflags perl_embed_ccflags +perl_includedir perl_useshrplib perl_privlibexp perl_archlibexp @@ -9773,6 +9774,14 @@ You might have to rebuild your Perl installation. Refer to the documentation for details. Use --without-perl to disable building PL/Perl." "$LINENO" 5 fi + # On most platforms, archlibexp is also where the Perl include files live ... + perl_includedir="$perl_archlibexp" + # ... but on some macOS versions, we must look under $PG_SYSROOT instead + if test x"$PG_SYSROOT" != x"" ; then +if test -d "$PG_SYSROOT$perl_archlibexp" ; then + perl_includedir="$PG_SYSROOT$perl_archlibexp" +fi + fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for CFLAGS recommended by Perl" >&5 $as_echo_n "checking for CFLAGS recommended by Perl... " >&6; } @@ -18355,7 +18364,7 @@ fi # check for if test "$with_perl" = yes; then ac_save_CPPFLAGS=$CPPFLAGS - CPPFLAGS="$CPPFLAGS -I$perl_archlibexp/CORE" + CPPFLAGS="$CPPFLAGS -I$perl_includedir/CORE" ac_fn_c_check_header_compile "$LINENO" "perl.h" "ac_cv_header_perl_h" "#include " if test "x$ac_cv_header_perl_h" = xyes; then : diff --git a/configure.in b/configure.in index 8fe6894..530f275 100644 --- a/configure.in +++ b/configure.in @@ -1044,6 +1044,15 @@ You might have to rebuild your Perl installation. Refer to the documentation for details. Use --without-perl to disable building PL/Perl.]) fi + # On most platforms, archlibexp is also where the Perl include files live ... + perl_includedir="$perl_archlibexp" + # ... but on some macOS versions, we must look under $PG_SYSROOT instead + if test x"$PG_SYSROOT" != x"" ; then +if test -d "$PG_SYSROOT$perl_archlibexp" ; then + perl_includedir="$PG_SYSROOT$perl_archlibexp" +fi + fi + AC_SUBST(perl_includedir)dnl PGAC_CHECK_PERL_EMBED_CCFLAGS PGAC_CHECK_PERL_EMBED_LDFLAGS fi @@ -2229,7 +2238,7 @@ fi # check for if test "$with_perl" = yes; then
Re: when set track_commit_timestamp on, database system abort startup
Sawada-san, On Mon, Sep 24, 2018 at 08:28:45AM +0900, Michael Paquier wrote: > Wouldn't it be better to incorporate the new test as part of > 004_restart.pl? This way, we avoid initializing a full instance, which > is always a good thing as that's very costly. The top of this file also > mentions that it tests clean restarts, but it bothers also about crash > recovery. I have been able to work on this bug, and rewrote the proposed test case as attached. The test can only get on v11 and HEAD. What do you think? -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4754e75436..ddc999b687 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6833,11 +6833,13 @@ StartupXLOG(void) StartupMultiXact(); /* - * Ditto commit timestamps. In a standby, we do it if setting is enabled - * in ControlFile; in a master we base the decision on the GUC itself. + * Ditto for commit timestamps. Activate the facility if the setting + * is enabled in the control file, as there should be no tracking of + * commit timestamps done when the setting was disabled. This facility + * can be started or stopped when replaying a XLOG_PARAMETER_CHANGE + * record. */ - if (ArchiveRecoveryRequested ? - ControlFile->track_commit_timestamp : track_commit_timestamp) + if (ControlFile->track_commit_timestamp) StartupCommitTs(); /* diff --git a/src/test/modules/commit_ts/t/004_restart.pl b/src/test/modules/commit_ts/t/004_restart.pl index daf42d3a02..4efe30a559 100644 --- a/src/test/modules/commit_ts/t/004_restart.pl +++ b/src/test/modules/commit_ts/t/004_restart.pl @@ -1,4 +1,4 @@ -# Testing of commit timestamps preservation across clean restarts +# Testing of commit timestamps preservation across restarts use strict; use warnings; use PostgresNode; @@ -71,12 +71,36 @@ is($after_restart_ts, $before_restart_ts, 'timestamps before and after restart are equal'); # Now disable commit timestamps - $node_master->append_conf('postgresql.conf', 'track_commit_timestamp = off'); - $node_master->stop('fast'); + +# Start the server, which generates a XLOG_PARAMETER_CHANGE record where +# the parameter change is registered. $node_master->start; +# Now restart again the server so as no record XLOG_PARAMETER_CHANGE are +# replayed with the follow-up immediate shutdown. +$node_master->restart; + +# Move commit timestamps across page boundaries. Things should still +# be able to work across restarts with those transactions committed while +# track_commit_timestamp is disabled. +$node_master->safe_psql('postgres', +qq(CREATE PROCEDURE consume_xid(cnt int) +AS \$\$ +DECLARE +i int; +BEGIN +FOR i in 1..cnt LOOP +EXECUTE 'SELECT txid_current()'; +COMMIT; +END LOOP; +END; +\$\$ +LANGUAGE plpgsql; +)); +$node_master->safe_psql('postgres', 'CALL consume_xid(2000)'); + ($ret, $stdout, $stderr) = $node_master->psql('postgres', qq[SELECT pg_xact_commit_timestamp('$xid');]); is($ret, 3, 'no commit timestamp from enable tx when cts disabled'); @@ -106,10 +130,12 @@ like( # Re-enable, restart and ensure we can still get the old timestamps $node_master->append_conf('postgresql.conf', 'track_commit_timestamp = on'); -$node_master->stop('fast'); +# An immediate shutdown is used here. At next startup recovery will +# replay transactions which committed when track_commit_timestamp was +# disabled, and the facility should be able to work properly. +$node_master->stop('immediate'); $node_master->start; - my $after_enable_ts = $node_master->safe_psql('postgres', qq[SELECT pg_xact_commit_timestamp('$xid');]); is($after_enable_ts, '', 'timestamp of enabled tx null after re-enable'); signature.asc Description: PGP signature
Re: Proposal for Signal Detection Refactoring
On 2018-09-25 11:50:47 +0900, Michael Paquier wrote: > PGDLLIMPORT changes don't get back-patched as well... We've been more aggressive with that lately, and I think that's good. It just is a unnecessarily portability barrier for extension to be judicious in applying that. - Andres
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
So I've tried to rough out a decision tree for the various options on how this might be implemented (discarding the "use precedence hacks" option). Opinions? Additions? (formatted for emacs outline-mode) * 1. use lexical lookahead +: relatively straightforward parser changes +: no new reserved words +: has the option of working extensibly with all functions -: base_yylex needs extending to 3 lookahead tokens ** 1.1. Allow from/ignore clause on all (or all non-agg) window function calls If the clauses are legal on all window functions, what to do about existing window functions for which the clauses do not make sense? *** 1.1.1. Ignore the clause when the function isn't aware of it +: simple -: somewhat surprising for users perhaps? *** 1.1.2. Change the behavior of the windowapi in some consistent way Not sure if this can work. +: fairly simple(maybe?) and predictable -: changes the behavior of existing window functions ** 1.2. Allow from/ignore clause on only certain functions +: avoids any unexpected behavior -: needs some way to control what functions allow it *** 1.2.1. Check the function name in parse analysis against a fixed list. +: simple -: not extensible *** 1.2.2. Provide some option in CREATE FUNCTION +: extensible -: fairly intrusive, adding stuff to create function and pg_proc *** 1.2.3. Do something magical with function argument types +: doesn't need changes in create function / pg_proc -: it's an ugly hack * 2. reserve nth_value etc. as functions +: follows the spec reasonably well +: less of a hack than extending base_yylex -: new reserved words -: more parser rules -: not extensible (now goto 1.2.1) * 3. "just say no" to the spec e.g. add new functions like lead_ignore_nulls(), or add extra boolean args to lead() etc. telling them to skip nulls +: simple -: doesn't conform to spec -: using extra args isn't quite the right semantics -- Andrew (irc:RhodiumToad)
Re: SSL tests failing with "ee key too small" error on Debian SID
Hello. At Mon, 17 Sep 2018 22:13:40 +0900, Michael Paquier wrote in <20180917131340.ge31...@paquier.xyz> > Hi all, > > On a rather freshly-updated Debian SID server, I am able to see failures > for the SSL TAP tests: > 2018-09-17 22:00:27.389 JST [13072] LOG: database system is shut down > 2018-09-17 22:00:27.506 JST [13082] FATAL: could not load server > certificate file "server-cn-only.crt": ee key too small > 2018-09-17 22:00:27.506 JST [13082] LOG: database system is shut down > 2018-09-17 22:00:27.720 JST [13084] FATAL: could not load server > certificate file "server-cn-only.crt": ee key too small > > Wouldn't it be better to rework the rules used to generate the different > certificates and reissue them in the tree? It seems to me that this is > just waiting to fail in other platforms as well.. I agree that we could get into the same trouble sooner or later. Do you mean that cert/key files are generated on-the-fly while running 'make check'? It sounds reasonable as long as just replaceing existing files with those with longer (2048bits?) keys doesn't work for all supported platforms. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Tue, Sep 25, 2018 at 1:39 AM Michael Paquier wrote: > On Mon, Sep 24, 2018 at 12:19:44PM +1000, Haribabu Kommi wrote: > > Attached new rebased version of the patch that enhances the > > pg_stat_statements_reset() > > function. This needs to be applied on top of the patch that is posted in > > [1]. > > +CREATE ROLE stats_regress_user1; > +CREATE ROLE stats_regress_user2; > Just a short note: regression tests creating roles should use regress_ > as prefix. > Thanks for the review. Fixed in the attached patch as per your suggestion. Regards, Haribabu Kommi Fujitsu Australia 0001-pg_stat_statements_reset-to-reset-specific-query-use_v6.patch Description: Binary data
Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role
On Tue, Sep 25, 2018 at 10:58 AM Michael Paquier wrote: > On Mon, Sep 24, 2018 at 12:02:35PM -0400, Tom Lane wrote: > > For v10 and up, the method used in 53b79ab4 is overcomplicated: you only > > need to add a delta script not a new base script. (If you had to > > back-patch before v10, it might be best to add a new base script in all > > the branches just to keep the patches consistent; but IIUC this issue > only > > arises in v10 and up.) I'd consider following, eg, 7f563c09f as a > > prototype instead. > > Of course, thanks. Sorry for the incorrect reference pointing to a > commit of REL9_6_STABLE. As the patch only needs to be applied down to > v10, there is no need to do anything more complicated than what Hari has > proposed. So, committed after a bit of comment and format tweaks. > Thanks for the changes and commit. Regards, Haribabu Kommi Fujitsu Australia
Re: Missing const in DSA.
On Tue, Sep 25, 2018 at 7:46 AM Thomas Munro wrote: > On Tue, Sep 25, 2018 at 4:17 AM Tom Lane wrote: > > Thomas Munro writes: > > > On Mon, Sep 24, 2018 at 9:32 AM Tom Lane wrote: > > >> Mark G writes: > > >>> While looking at some of the recent churn in DSA I noticed that > > >>> dsa_size_class_map should probably be declared const. > > > > >> +1 ... also, given the contents of the array, "char" seems like > > >> rather a misnomer. I'd be happier if it were declared as uint8, say. Pushed. Thanks both for the code review! -- Thomas Munro http://www.enterprisedb.com
Re: Proposal for Signal Detection Refactoring
On Mon, Sep 24, 2018 at 10:39:40PM -0400, Tom Lane wrote: > I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest. Same question here. As that's kind of a separate discussion, I left it out. Now I don't mind changing that at the same time as that's harmless, and as that's only a patch for HEAD. PGDLLIMPORT changes don't get back-patched as well... -- Michael signature.asc Description: PGP signature
Re: Proposal for Signal Detection Refactoring
Michael Paquier writes: > Let's change it then. ClientConnectionLost needs also to be changed as > miscadmin.h tells that it could be used in a signal handler. What do you > think about the attached? Looks reasonable to me (I've not tested though). I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest. regards, tom lane
Re: Proposal for Signal Detection Refactoring
On Mon, Sep 24, 2018 at 09:38:11PM -0400, Tom Lane wrote: > Yeah, in principle any global variable touched by a signal handler should > be sig_atomic_t. I don't know of any modern platform where using "bool" > is unsafe, but per the C standard it could be. The case that would be > worrisome is if setting the variable requires a load/modify/store, which > does apply to char-sized variables on some ancient platforms. I think > there's no need to worry for int-sized variables. Let's change it then. ClientConnectionLost needs also to be changed as miscadmin.h tells that it could be used in a signal handler. What do you think about the attached? -- Michael diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index f7d6617a13..5971310aab 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -27,11 +27,11 @@ ProtocolVersion FrontendProtocol; -volatile bool InterruptPending = false; -volatile bool QueryCancelPending = false; -volatile bool ProcDiePending = false; -volatile bool ClientConnectionLost = false; -volatile bool IdleInTransactionSessionTimeoutPending = false; +volatile sig_atomic_t InterruptPending = false; +volatile sig_atomic_t QueryCancelPending = false; +volatile sig_atomic_t ProcDiePending = false; +volatile sig_atomic_t ClientConnectionLost = false; +volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; volatile sig_atomic_t ConfigReloadPending = false; volatile uint32 InterruptHoldoffCount = 0; volatile uint32 QueryCancelHoldoffCount = 0; diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index e167ee8fcb..8ff106e051 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -77,13 +77,13 @@ /* in globals.c */ /* these are marked volatile because they are set by signal handlers: */ -extern PGDLLIMPORT volatile bool InterruptPending; -extern PGDLLIMPORT volatile bool QueryCancelPending; -extern PGDLLIMPORT volatile bool ProcDiePending; -extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending; +extern PGDLLIMPORT volatile sig_atomic_t InterruptPending; +extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending; +extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending; +extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending; -extern volatile bool ClientConnectionLost; +extern volatile sig_atomic_t ClientConnectionLost; /* these are marked volatile because they are examined by signal handlers: */ extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount; signature.asc Description: PGP signature
Re: DNS SRV support for LDAP authentication
On Tue, Sep 25, 2018 at 2:09 PM Thomas Munro wrote: > 2. Define a new zone for testing, by adding the following to the end > 3. Create that zone file in /usr/local/etc/namedb/master/my.test.domain: Oops, I changed my testing domain name in the middle of my experiment, but pasted the older version into the previous message. Here are the corrected steps 2 and 3, consistent with the rest: = end of /usr/local/etc/namedb/named.conf = zone "my-domain.com" { type master; file "/usr/local/etc/namedb/master/my-domain.com"; }; = = /usr/local/etc/namedb/master/my-domain.com = $TTL10 @ IN SOA ns.my-domain.com. admin.my-domain.com. ( 2 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL IN NS ns.my-domain.com. ns.my-domain.com. IN A 127.0.0.1 my-domain.com. IN A 127.0.0.1 ldap-server.my-domain.com. IN A 127.0.0.1 _ldap._tcp.my-domain.com. IN SRV 0 0 389 ldap-server = -- Thomas Munro http://www.enterprisedb.com
RE: Changing the setting of wal_sender_timeout per standby
From: Michael Paquier [mailto:mich...@paquier.xyz] > Okay, I have pushed the patch with all your suggestions included. Thanks so much! Regards Takayuki Tsunakawa
DNS SRV support for LDAP authentication
Hello hackers, Some people like to use DNS SRV records to advertise LDAP servers on their network. Microsoft Active Directory is usually (always?) set up that way. Here is a patch to allow our LDAP auth module to support that kind of discovery. It copies the convention of the OpenLDAP command line tools: if you give it a URL that has no hostname, it'll try to extract a domain name from the bind DN, and then ask your DNS server for a SRV record for LDAP-over-TCP at that domain. The OpenLDAP version of libldap.so exports the magic to do that, so the patch is very small (but the infrastructure set-up to test it is a bit of a schlep, see below). I'll add this to the next Commitfest. Testing instructions for (paths and commands given for FreeBSD, adjust as appropriate): 1. Install BIND: $ sudo pkg install bind99 2. Define a new zone for testing, by adding the following to the end of /usr/local/etc/namedb/named.conf: = 8< = zone "my.test.domain" { type master; file "/usr/local/etc/namedb/master/my.test.domain"; }; = 8< = 3. Create that zone file in /usr/local/etc/namedb/master/my.test.domain: = 8< = $TTL10 @ IN SOA ns.my.test.domain. admin.my.test.domain. ( 2 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL IN NS ns.my.test.domain. ns.my.test.domain. IN A 127.0.0.1 my.test.domain. IN A 127.0.0.1 ldap-server.my.test.domain. IN A 127.0.0.1 _ldap._tcp.my.test.domain. IN SRV 0 0 389 ldap-server = 8< = 4. Start up bind: # service named onestart 5. Confirm that SRV lookups find our record: $ dig @localhost _ldap._tcp.my-domain.com SRV ... ;; ANSWER SECTION: _ldap._tcp.my-domain.com. 10IN SRV 0 0 389 ldap-server.my-domain.com. 6. Tell your system libraries to use this DNS server by temporarily changing /etc/resolv.conf to say: = 8< = nameserver 127.0.0.1 = 8< = 7. Confirm that the OpenLDAP tools can look that SRV record up: $ ldapsearch -H 'ldap:///ou%3Dblah%2Cdc%3Dmy-domain%2Cdc%3Dcom' (That's "ou=blah,dc=my-domain,dc=com" URL-encoded, from which "my-domain.com" will be extracted.) You should see that it's trying to connect to ldap-server port 389, and you can stick 'x' on the end of it to see what it looks like when it can't find a SRV record, as a sanity check: $ ldapsearch -H 'ldap:///ou%3Dblah%2Cdc%3Dmy-domain%2Cdc%3Dcomx' DNS SRV: Could not turn domain=my-domain.comx into a hostlist 8. Set up an LDAP server listening on localhost port 389, and create a user, such that you can actually authenticate from PostgreSQL with it. Gory details omitted. First test that you can log in with LDAP authentication when using a pg_hba.conf line like this: host all fred 127.0.0.1/32 ldap ldapurl="ldap://ldap-server.my-domain.com/dc=my-domain,dc=com?cn?sub"; 9. Next apply the patch and verify that you can take out the hostname and let it be discovered via DNS SRV: host all fred 127.0.0.1/32 ldap ldapurl="ldap:///dc=my-domain,dc=com?cn?sub"; (You can stick some elog(LOG, ...) lines into InitializeLDAPConnection() if you want to check that ldap_domain2hostlist() is in fact finding the hostname and port.) This is a first draft. Not tested much yet. I wonder if HAVE_LDAP_INITIALIZE is a reasonable way to detact OpenLDAP. The documentation was written in about 7 seconds so probably needs work. There is probably a Windowsy way to do this too but I didn't look into that. -- Thomas Munro http://www.enterprisedb.com 0001-Add-DNS-SRV-support-for-LDAP-server-discovery.patch Description: Binary data
Re: [patch] Bug in pg_dump/pg_restore using --no-publication
On Fri, Sep 21, 2018 at 05:44:02PM +0200, Gilles Darold wrote: > Attached is a patch that fixes a bug in pg_dump since 10.0 and > reproducible in master. When using option --no-publication : ALTER > PUBLICATION orders are still present in the dump. Thanks for the report, the patch, and the test case, Gilles! > pg_restore --no-publication test.dump -l test.dump| grep PUBLICATION > 2230; 6106 16389 PUBLICATION TABLE public p1 t1 > 2231; 6106 16392 PUBLICATION TABLE public p_insert_only t1 This command does not actually work ;) > Should I add it to current commitfest ? No need to. I have committed your patch down to v10 after making sure that we are not missing any other spots, and testing each branch manually. -- Michael signature.asc Description: PGP signature
Re: Proposal for Signal Detection Refactoring
Michael Paquier writes: > At the same time, all the pending flags in miscadmin.h could be switched > to sig_atomic_t if we were to be correct, no? The counters could be > higher than 256 so that's not really possible. Yeah, in principle any global variable touched by a signal handler should be sig_atomic_t. I don't know of any modern platform where using "bool" is unsafe, but per the C standard it could be. The case that would be worrisome is if setting the variable requires a load/modify/store, which does apply to char-sized variables on some ancient platforms. I think there's no need to worry for int-sized variables. regards, tom lane
Re: Proposal for Signal Detection Refactoring
On Mon, Sep 24, 2018 at 09:03:49PM -0400, Tom Lane wrote: > You could only fix that by blocking all signal handling during the > handler, which would be expensive and rather pointless. > > I do not think that it's readily possible to improve on the current > situation with one sig_atomic_t per flag. Okay, thanks for the confirmation. At the same time, all the pending flags in miscadmin.h could be switched to sig_atomic_t if we were to be correct, no? The counters could be higher than 256 so that's not really possible. -- Michael signature.asc Description: PGP signature
Re: Proposal for Signal Detection Refactoring
Michael Paquier writes: > And then within separate signal handlers things like: > void > StatementCancelHandler(SIGNAL_ARGS) > { > [...] > signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY; > [...] > } AFAICS this still wouldn't work. The machine code is still going to look (on many machines) like "load from signalPendingFlags, OR in some bits, store to signalPendingFlags". So there's still a window for another signal handler to interrupt that and store some bits that will get lost. You could only fix that by blocking all signal handling during the handler, which would be expensive and rather pointless. I do not think that it's readily possible to improve on the current situation with one sig_atomic_t per flag. regards, tom lane
Re: Proposal for Signal Detection Refactoring
On Mon, Sep 24, 2018 at 05:23:56PM -0700, Andres Freund wrote: > On 2018-09-25 08:57:25 +0900, Michael Paquier wrote: >> Anyway, putting the back-patching pain aside, and just for my own >> knowledge... Andres, would it be fine to just use one sig_atomic_t >> field which can be set from different code paths? Say: >> typedef enum SignalPendingType { >> PENDING_INTERRUPT, >> PENDING_CANCEL_QUERY, >> PENDING_PROC_DIE, >> PENDING_RELOAD, >> PENDING_SESSION_TIMEOUT >> }; > > Well, they'd have to different bits... Sure, I forgot to write the "foo = 1 << 0" and such ;) >> extern volatile sig_atomic_t signalPendingFlags; > > Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit > wide - so you couldn't have that many flags. I see. I thought that it mapped at least int, but C99 tells that it can be as small as char. That's indeed quite limited... -- Michael signature.asc Description: PGP signature
Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role
On Mon, Sep 24, 2018 at 12:02:35PM -0400, Tom Lane wrote: > For v10 and up, the method used in 53b79ab4 is overcomplicated: you only > need to add a delta script not a new base script. (If you had to > back-patch before v10, it might be best to add a new base script in all > the branches just to keep the patches consistent; but IIUC this issue only > arises in v10 and up.) I'd consider following, eg, 7f563c09f as a > prototype instead. Of course, thanks. Sorry for the incorrect reference pointing to a commit of REL9_6_STABLE. As the patch only needs to be applied down to v10, there is no need to do anything more complicated than what Hari has proposed. So, committed after a bit of comment and format tweaks. -- Michael signature.asc Description: PGP signature
Re: Implementing SQL ASSERTION
> "Joe" == Joe Wildish writes: Joe> Agreed. My assumption was that we would record in the data Joe> dictionary the behaviour (or “polarity") of each aggregate Joe> function with respect to the various operators. Column in Joe> pg_aggregate? I don’t know how we’d record it exactly. I haven't looked at the background of this, but if what you want to know is whether the aggregate function has the semantics of min() or max() (and if so, which) then the place to look is pg_aggregate.aggsortop. (For a given aggregate foo(x), the presence of an operator oid in aggsortop means something like "foo(x) is equivalent to (select x from ... order by x using OP limit 1)", and the planner will replace the aggregate by the applicable subquery if it thinks it'd be faster.) As for operators, you can only make assumptions about their meaning if the operator is a member of some opfamily that assigns it some semantics. For example, the planner can assume that WHERE x=y AND x=1 implies that y=1 (assuming x and y are of appropriate types) not because it assumes that "=" is the name of a transitive operator, but because the operators actually selected for (x=1) and (x=y) are both "equality" members of the same btree operator family. Likewise proving that (a>2) implies (a>1) requires knowing that > is a btree comparison op. -- Andrew (irc:RhodiumToad)
Re: Proposal for Signal Detection Refactoring
On 2018-09-25 08:57:25 +0900, Michael Paquier wrote: > On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote: > > This doesn't seem to solve an actual problem, why are we discussing > > changing this? What'd be measurably improved, worth the cost of making > > backpatching more painful? > > My point was just to reduce the number of variables used and ease > debugger lookups with what is on the stack. I'm not sure a bitflag really gives you that - before gdb gives you the plain value, afterwards you need to know the enum values and do bit math to know. > Anyway, putting the back-patching pain aside, and just for my own > knowledge... Andres, would it be fine to just use one sig_atomic_t > field which can be set from different code paths? Say: > typedef enum SignalPendingType { > PENDING_INTERRUPT, > PENDING_CANCEL_QUERY, > PENDING_PROC_DIE, > PENDING_RELOAD, > PENDING_SESSION_TIMEOUT > }; Well, they'd have to different bits... > extern volatile sig_atomic_t signalPendingFlags; Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit wide - so you couldn't have that many flags. Greetings, Andres Freund
Re: Proposal for Signal Detection Refactoring
On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote: > This doesn't seem to solve an actual problem, why are we discussing > changing this? What'd be measurably improved, worth the cost of making > backpatching more painful? My point was just to reduce the number of variables used and ease debugger lookups with what is on the stack. Anyway, putting the back-patching pain aside, and just for my own knowledge... Andres, would it be fine to just use one sig_atomic_t field which can be set from different code paths? Say: typedef enum SignalPendingType { PENDING_INTERRUPT, PENDING_CANCEL_QUERY, PENDING_PROC_DIE, PENDING_RELOAD, PENDING_SESSION_TIMEOUT }; extern volatile sig_atomic_t signalPendingFlags; And then within separate signal handlers things like: void StatementCancelHandler(SIGNAL_ARGS) { [...] signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY; [...] } void PostgresSigHupHandler(SIGNAL_ARGS) { [...] signalPendingFlags |= ConfigReloadPending; [...] } -- Michael signature.asc Description: PGP signature
Re[2]: Adding a note to protocol.sgml regarding CopyData
Thanks for the feedback. On 2018-09-22, Amit Kapila wrote ... > ... Why can't we just extend the current Note where it is currently ... Because information about how the protocol works belongs in the protocol documentation not in the documentation for one implementation of the protocol. Think of it this way, if the only full explanation of this information was in the psqlODBC or pgJDBC documentation would you feel comfortable just referencing it from protocol.sgml? I would not and, in my opinion, libpq's being the reference client implementation should not change that. On top of that, in the libpq documentation the termination line is only mentioned in a section titled "Obsolete Functions for COPY" which makes it even less likely that someone working on a different implementation of the protocol will notice it. [strong opinion - I would object to leaving the only description in the libpq documentation] > ... why do we want to duplicate the same information ... The change to the CopyData message documentation does refer back to the full description. My intent with the brief description of the \. line was to include enough information so that the reader could decide whether or not skipping back to the full description would be useful. I think that usefulness outweighs the minor duplication. [moderate opinion - I plan to leave it as is unless others weigh in in favor of just keeping the reference] But given that I don't work on libpq or even use it, I'm not comfortable changing the documentation of 4 different libpq methods (even obsolete methods) on my own initiative. If the committer who picks this up wants the libpq documentation changed as part of this, that would be different and I'd be willing to give it a shot. [no strong feelings one way or the other - I would leave the libpq documentation as is but could easily be swayed] > ... duplicate the same information in different words at three different places ... I count 7 different places. In the protocol docs, there is the old mention in the "Summary of Changes since Protocol 2.0" section and the two new mentions in the protocol definitions plus after reading through libpq-copy.html again, I think all 4 of these sections refer to the terminating line/end-of-copy-data marker. PQgetline - "... the application must check to see if a new line consists of the two characters \., which indicates ..." PQgetlineAsync - "... returns -1 if the end-of-copy-data marker has been recognized ..." PQputline - "... Note ...send the two characters \. as a final line ..." PQendcopy - "... followed by PQendcopy after the terminator line is seen ..." [informational - lists references to remove when a future protocol drops the \. line entirely]
Re: Implementing SQL ASSERTION
Hi Peter, > On 24 Sep 2018, at 15:06, Peter Eisentraut > wrote: > > On 29/04/2018 20:18, Joe Wildish wrote: >> >> Attached is a rebased patch for the prototype. > > I took a look at this. Thank you for reviewing. > This has been lying around for a few months, so it will need to be > rebased again. > > 8< - - - snipped for brevity - - - 8< > > All this new code in constraint.c that checks the assertion expression > needs more comments and documentation. All agreed. I’ll give the patch some TLC and get a new version that addresses the above. > Stuff like this isn't going to work: > > static int > funcMaskForFuncOid(Oid funcOid) > { >char *name = get_func_name(funcOid); > >if (name == NULL) >return OTHER_FUNC; >else if (strncmp(name, "min", strlen("min")) == 0) >return MIN_AGG_FUNC; >else if (strncmp(name, "max", strlen("max")) == 0) >return MAX_AGG_FUNC; > > You can assume from the name of a function what it's going to do. > Solving this properly might be hard. Agreed. My assumption was that we would record in the data dictionary the behaviour (or “polarity") of each aggregate function with respect to the various operators. Column in pg_aggregate? I don’t know how we’d record it exactly. A bitmask would be a possibility. Also, I don’t know what we’d do with custom aggregate functions (or indeed custom operators). Allowing end users to determine the value would potentially lead to assertion checks being incorrectly skipped. Maybe we’d say that custom aggregates always have a neutral polarity and are therefore not subject to this optimisation. > This ought to be reproducible for you if you build with assertions. Yes. I shall correct this when I do the aforementioned rebase and application of TLC. > My feeling is that if we want to move forward on this topic, we need to > solve the concurrency question first. All these optimizations for when > we don't need to check the assertion are cool, but they are just > optimizations that we can apply later on, once we have solved the > critical problems. I obviously agree that the concurrency issue needs solving. But I don’t see that at all as a separate matter from the algos. Far from being merely optimisations, the research indicates we can go a lot further toward reducing the need for rechecks and, therefore, reducing the chance of concurrency conflicts from occurring in the first place. This is true regardless of whatever mechanism we use to enforce correct behaviour under concurrent modifications -- e.g. a lock on the ASSERTION object itself, enforced use of SERIALIZABLE, etc. By way of example (lifted directly from the AM4DP book): CREATE TABLE employee ( id INTEGER PRIMARY KEY, dept INTEGER NOT NULL, job TEXT NOT NULL ); CREATE ASSERTION department_managers_need_administrators CHECK (NOT EXISTS (SELECT dept FROM employee a WHERE EXISTS (SELECT * FROM employee b WHERE a.dept = b.dept AND b.job IN ('Manager', 'Senior Manager')) AND NOT EXISTS (SELECT * FROM employee b WHERE a.dept = b.dept AND b.job = 'Administrator'))); The current implementation derives "DELETE(employee), INSERT(employee) and UPDATE(employee.dept, employee.job)" as the set of invalidating operations and triggers accordingly. However, in this case, we can supplement the triggers by having them inspect the transition tables to see if the actual data from the triggering DML statement could in fact affect the truth of the expression: specifically, only do the recheck on DELETE of an "Administrator", INSERT of a "Manager" or "Senior Manager", or UPDATE when the new job is a "Manager" or "Senior Manager" or the old job was an "Administrator". Now, if this is a company with 10,000 employees, and would therefore presumably only require a handful of managers, right? ;-), then the potential for a concurrency conflict is massively reduced when compared to rechecking every time the employee table is touched. (This optimisation has some caveats and is reliant upon being able to derive the key of an expression from the underlying base tables plus some stuff about functional dependencies. I have started work on it but sadly not had time to progress it in recent months). Having said all that: there are obviously going to be some expressions that cannot be proven to have no potential for invalidating the assertion truth. I guess this is the prime concern from a concurrency PoV? Example: CREATE TABLE t ( b BOOLEAN NOT NULL, n INTEGER NOT NULL, PRIMARY KEY (b, n) ); CREATE ASSERTION sum_per_b_less_than_10 CHECK (NOT EXISTS (SELECT FROM (SELECT b, SUM(n) FROM t GROUP BY b) AS v(b, sum_n) WHERE sum_n > 10)); Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". I guess the interesting case, from a concurrency perspective, is how do we avoid an INSERT WHERE b
Re: Calculate total_table_pages after set_base_rel_sizes()
David Rowley said: > I believe that we should be delaying the PlannerInfo's > total_table_pages calculation until after constraint exclusion and > partition pruning have taken place. Doing this calculation before we > determine which relations we don't need to scan can lead to > incorrectly applying random_page_cost to too many pages processed > during an Index Scan. > > We already don't count relations removed by join removals from this > calculation, so counting pruned partitions seems like an omission. > > The attached patch moves the calculation to after set_base_rel_sizes() > is called and before set_base_rel_pathlists() is called, where the > information is actually used. > > I am considering this a bug fix, but I'm proposing this for PG12 only > as I don't think destabilising plans in the back branches is a good > idea. I'll add this to the September commitfest. Hi David, I had a quick look at this. (I haven't tested it so this isn't a full review.) It looks like a fairly straightforward code move. And I think it's correct to exclude the pages from partitions that won't be read. I have a small tweak. In make_one_rel, we currently have: /* * Compute size estimates and consider_parallel flags for each base rel, * then generate access paths. */ set_base_rel_sizes(root); set_base_rel_pathlists(root); Your patch inserts code between the two lines. I think the comment should be split too. /* Compute size estimates and consider_parallel flags for each base rel. */ set_base_rel_sizes(root); // NEW CODE /* Generate access paths. */ set_base_rel_pathlists(root); Cheers, Edmund
Re: auto_explain: Include JIT output if applicable
On Mon, Sep 24, 2018 at 1:48 PM, Andres Freund wrote: > > Thanks for noticing - pushed! > Thanks! Best, Lukas -- Lukas Fittl
Re: PATCH: Update snowball stemmers
Arthur Zakirov writes: > Ah, I see. I attached new version made with --no-renames. Will wait for > what cfbot will say. I reviewed and pushed this. As a cross-check on the patch, I cloned the Snowball github repo and built the derived files in it. I noticed that they'd incorporated several new stemmers since 2007 --- not only your Nepali one, but half a dozen more besides. Since the point here is (IMO) mostly to follow their lead on what's interesting, I went ahead and added those as well. In short, therefore, the commit includes the Nepali stuff from your other thread as well as what was in this one. Although I added nepali.stop from the other patch, I've not done anything about updating our other stopword lists. Presumably those are a bit obsolete by now as well. I wonder if we can prevail on the Snowball people to make those available in some less painful way than scraping them off assorted web pages. Ideally they'd stick them into their git repo ... regards, tom lane
Re: Collation versioning
On Mon, Sep 24, 2018 at 1:47 PM Thomas Munro wrote: > Personally I'm not planning to work on multi-version installation any > time soon, I was just scoping out some basic facts about all this. I > think the primary problem that affects most of our users is the > shifting-under-your-feet problem, which we now see applies equally to > libc and libicu. Are we sure about that? Could it just be that ICU will fix bugs that cause their strcoll()-alike and strxfrm()-alike functions to give behavior that isn't consistent with the behavior required by the CLDR version in use? This seems like it might be a very useful distinction. We know that glibc had bugs that were caused by strxfrm() not agreeing with strcoll() -- that was behind the 9.5-era abbreviated keys issues. But that was actually a bug in an optimization in strcoll(), rather than a strxfrm() bug. strxfrm() gave the correct answer, which is to say the answer that was right according to the high level collation definition. It merely failed to be bug-compatible with strcoll(). What's ICU supposed to do about an issue like that? If we're going to continue to rely on the strxfrm() equivalent from ICU, then it seems to me that ICU should be able to change behaviors in a stable release, provided the behavior they're changing is down to a bug in their infrastructure, as opposed to an organic evolution in how some locale sorts text (CLDR update). My understanding is that ICU is designed to decouple technical issues with issues of concern to natural language experts, so we as an ICU client can limit ourselves to worrying about one of the two at any given time. -- Peter Geoghegan
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
On 2018-Sep-24, Sergei Kornilov wrote: > Hi > > > An autovacuum can't be just aggressive; it's either anti-wraparound or > > normal. > But autovacuum _can_ be aggressive and not anti-wraparound. > I build current master and can see 3 different line types: > 2018-09-24 23:47:31.500 MSK 27939 @ from [vxid:4/272032 txid:0] [] LOG: > automatic aggressive vacuum of table "postgres.public.foo": index scans: 0 > 2018-09-24 23:49:27.892 MSK 28333 @ from [vxid:4/284297 txid:0] [] LOG: > automatic aggressive vacuum to prevent wraparound of table > "postgres.public.foo": index scans: 0 > 2018-09-24 23:49:29.093 MSK 28337 @ from [vxid:4/284412 txid:0] [] LOG: > automatic vacuum of table "postgres.public.foo": index scans: 0 Exactly. It cannot be anti-wraparound and not aggressive, which is the line type not shown. "Aggressive" means it scans all pages; "anti-wraparound" means it does not let itself be cancelled because of another process waiting for a lock on the table. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Include application_name in "connection authorized" log message
Greetings Don! * Don Seiler (d...@seiler.us) wrote: > On Tue, Aug 7, 2018 at 12:32 PM Tom Lane wrote: > > Don Seiler writes: > > > > > 1. We want to make a generic, central ascii-lobotomizing function similar > > > to check_application_name that we can re-use there and for other checks > > (eg > > > user name). > > > 2. Change check_application_name to call this function (or just call this > > > function instead of check_application_name()?) > > > > check_application_name's API is dictated by the GUC check-hook interface, > > and doesn't really make sense for this other use. So the first part of > > that, not the second. > > > > > 3. Call this function when storing the value in the port struct. > > > > I'm not sure where exactly is the most sensible place to call it, > > but trying to minimize the number of places that know about this > > kluge seems like a good principle. > > OK I created a new function called clean_ascii() in common/string.c. I call > this from my new logic in postmaster.c as well as replacing the logic in > guc.c's check_application_name() and check_cluster_name(). Since we're putting it into common/string.c (which seems pretty reasonable to me, at least), I went ahead and changed it to be 'pg_clean_ascii'. I didn't see any other obvious cases where we could use this function (though typecmds.c does have an interesting ASCII check for type categories..). Otherwise, I added some comments, added application_name to the replication 'connection authorized' messages (seems like we really should be consistent across all of them...), ran it through pgindent, and updated a variable name or two here and there. > I've been fighting my own confusion with git and rebasing and fighting the > same conflicts over and over and over, but this patch should be what I > want. If anyone has time to review my git process, I would appreciate it. I > must be doing something wrong to have these same conflicts every time I > rebase (or I completely misunderstand what it actually does). I'd be happy to chat about it sometime, of course, just have to find time when we both have a free moment. :) Attached is the updated patch. If you get a chance to look over it again and make sure it looks good to you, that'd be great. I did a bit of testing of it myself but wouldn't complain if someone else wanted to also. One thing I noticed while testing is that our 'disconnection' message still emits 'database=' for replication connections even though the 'connection authorized' message doesn't (presumably because we realized it's a bit silly to do so when we say 'replication connection'...). Seems like it'd be nice to have the log_connection / log_disconnection messages have some consistency about them but that's really a different discussion from this. Thanks! Stephen From 7151ee89e1663a762f928f33ad4023e70257ef9e Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 24 Sep 2018 15:59:50 -0400 Subject: [PATCH] Add application_name to connection authorized msg The connection authorized message has quite a bit of useful information in it, but didn't include the application_name (when provided), so let's add that as it can be very useful. Note that at the point where we're emitting the connection authorized message, we haven't processed GUCs, so it's not possible to get this by using log_line_prefix (which pulls from the GUC). There's also something to be said for having this included in the connection authorized message and then not needing to repeat it for every line, as having it in log_line_prefix would do. The GUC cleans the application name to pure-ascii, so do that here too, but pull out the logic for cleaning up a string into its own function in common and re-use it from those places, and check_cluster_name which was doing the same thing. Author: Don Seiler Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com --- src/backend/postmaster/postmaster.c | 16 + src/backend/utils/init/postinit.c | 54 - src/backend/utils/misc/guc.c| 17 ++--- src/common/string.c | 19 ++ src/include/common/string.h | 1 + src/include/libpq/libpq-be.h| 7 6 files changed, 84 insertions(+), 30 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 305ff36258..41de140ae0 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -99,6 +99,7 @@ #include "catalog/pg_control.h" #include "common/file_perm.h" #include "common/ip.h" +#include "common/string.h" #include "lib/ilist.h" #include "libpq/auth.h" #include "libpq/libpq.h" @@ -2096,6 +2097,21 @@ retry1: pstrdup(nameptr)); port->guc_options = lappend(port->guc_options, pstrdup(valptr)); + +/* + * Copy application_name to port if we come across it. This + * is done so we can log the application
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
Hi > An autovacuum can't be just aggressive; it's either anti-wraparound or normal. But autovacuum _can_ be aggressive and not anti-wraparound. I build current master and can see 3 different line types: 2018-09-24 23:47:31.500 MSK 27939 @ from [vxid:4/272032 txid:0] [] LOG: automatic aggressive vacuum of table "postgres.public.foo": index scans: 0 2018-09-24 23:49:27.892 MSK 28333 @ from [vxid:4/284297 txid:0] [] LOG: automatic aggressive vacuum to prevent wraparound of table "postgres.public.foo": index scans: 0 2018-09-24 23:49:29.093 MSK 28337 @ from [vxid:4/284412 txid:0] [] LOG: automatic vacuum of table "postgres.public.foo": index scans: 0 regards, Sergei
Re: auto_explain: Include JIT output if applicable
Hi, On 2018-09-24 11:34:38 -0700, Lukas Fittl wrote: > Hi, > > Whilst playing around with auto_explain and JIT today, I realized that > auto_explain currently doesn't output JIT information, which is rather > unfortunate when analyzing a larger set of queries in a semi-automated > manner. > > Attached a trivial patch that fixes the issue and adds JIT information to > auto_explain with the same logic as used for regular EXPLAIN. Thanks for noticing - pushed! It's pretty annoying that so much of this code is duplicated in auto_explain. It'd be good if we refactored explain.c so that there's less duplication. But that seems like it'd not be v11 work, so... - Andres
Re: Collation versioning
On Tue, Sep 25, 2018 at 4:26 AM Douglas Doole wrote:> > On Sun, Sep 23, 2018 at 2:48 PM Thomas Munro > wrote: >> Admittedly that creates a whole can >> of worms for initdb-time catalog creation, package maintainers' jobs, >> how long old versions have to be supported and how you upgraded >> database objects to new ICU versions. > > > Yep. We never come up with a good answer for that before I left IBM. At the > time, DB2 only supported 2 or 3 version of ICU, so they were all shipped as > part of the install bundle. > > Long term, I think the only viable approach to supporting multiple versions > of ICU is runtime loading of the libraries. Then it's up to the system > administrator to make sure the necessary versions are installed on the system. I wonder if we would be practically constrained to using the distro-supplied ICU (by their policies of not allowing packages to ship their own copies ICU); it seems like it. I wonder which distros allow multiple versions of ICU to be installed. I see that Debian 9.5 only has 57 in the default repo, but the major version is in the package name (what is the proper term for that kind of versioning?) and it doesn't declare a conflict with other versions, so that's promising. Poking around with nm I noticed also that both the RHEL and Debian ICU libraries have explicitly versioned symbol names like "ucol_strcollUTF8_57", which is also promising. FreeBSD seems to have used "--disable-renaming" and therefore defines only "ucol_strcollUTF8"; doh. This topic is discussed here: http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-ICU-as-an-Operating-System-Level-Library Personally I'm not planning to work on multi-version installation any time soon, I was just scoping out some basic facts about all this. I think the primary problem that affects most of our users is the shifting-under-your-feet problem, which we now see applies equally to libc and libicu. >> Yeah, it seems like ICU is *also* subject to minor changes that happen >> under your feet, much like libc. For example maintenance release 60.2 >> (you can't install that at the same time as 60.1, but you can install >> it at the same time as 59.2). You'd be linked against libicu.so.60 >> (and thence libicudata.so.60), and it gets upgraded in place when you >> run the local equivalent of apt-get upgrade. > > This always worried me because an unexpected collation change is so painful > for a database. And I was never able to think of a way of reliably testing > compatibility either because of ICU's ability to reorder and group characters > when collating. I think the best we can do is to track versions per dependency (ie record it when the CHECK is created, when the index is created or rebuilt, ...) and generate loud warnings until you've dealt with each version dependency. That's why I've suggested we could consider sticking it on pg_depend (though I have apparently failed to convince Stephen so far). I think something like that is better than the current collversion design, which punts the problem to the DBA: "hey, human, there might be some problems, but I don't know where! Please tell me when you've fixed them by running ALTER COLLATION ... REFRESH VERSION!" instead of having the computer track of what actually needs to be done on an object-by-object basis and update the versions one-by-one automatically when the problems are resolved. -- Thomas Munro http://www.enterprisedb.com
Re: fast default vs triggers
On 09/20/2018 09:04 AM, Tomas Vondra wrote: On 09/19/2018 10:35 PM, Andrew Dunstan wrote: On 09/18/2018 03:36 PM, Andrew Dunstan wrote: Tomas Vondra has pointed out to me that there's an issue with triggers not getting expanded tuples for columns with fast defaults. Here is an example that shows the issue: andrew=# create table blurfl (id int); CREATE TABLE andrew=# insert into blurfl select x from generate_series(1,5) x; INSERT 0 5 andrew=# alter table blurfl add column x int default 100; ALTER TABLE andrew=# create or replace function showmej() returns trigger language plpgsql as $$ declare j json; begin j := to_json(old); raise notice 'old x: %', j->>'x'; return new; end; $$; CREATE FUNCTION andrew=# create trigger show_x before update on blurfl for each row execute procedure showmej(); CREATE TRIGGER andrew=# update blurfl set id = id where id = 1; NOTICE: old x: UPDATE 1 andrew=# update blurfl set id = id where id = 1; NOTICE: old x: 100 UPDATE 1 andrew=# The error is fixed with this patch: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 2436692..f34a72a 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -3396,7 +3396,11 @@ ltrmark:; LockBuffer(buffer, BUFFER_LOCK_UNLOCK); } - result = heap_copytuple(&tuple); + if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts) + result = heap_expand_tuple(&tuple, relation->rd_att); + else + result = heap_copytuple(&tuple); + ReleaseBuffer(buffer); return result; I'm going to re-check the various places that this might have been missed. I guess it belongs on the open items list. I haven't found anything further that is misbehaving. I paid particular attention to indexes and index-only scans. I propose to commit this along with an appropriate regression test. Seems reasonable to me. This exposed a further issue with nulls in certain positions. A fix for both issues, and some regression tests, has been committed. Thanks to Tomas for his help. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
> On Sep 24, 2018, at 1:29 PM, Andres Freund wrote: > > I'm very doubtful this is an improvement. Especially with the upcoming > pluggable storage work making vacuumlazy.c heap specific, while vacuum.c > stays generic. The concept of something like > PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much > (even if criteria for it might). That’s already a problem since vacuum logging is spread all over while autovac logging is not. Perhaps there needs to be some sort of vacuum_log() function that immediately provides output for manual vacuums, but aggregates output for autovac. AFAIK that’s the only real reason for autovac logging being a special case today.
Re: Participate in GCI as a Mentor
Thanks Sarah, I have received the emails and invitation. Now, I am reviewing all stuff and will respond with my ideas within a day. Regards Tahir On Tue, Sep 25, 2018 at 12:34 AM Sarah Conway Schnurr < xenophene...@gmail.com> wrote: > Tahir, > > I've extended an invitation through GCI for you to become a mentor for the > PostgreSQL > organization, and have sent an additional email containing information on > how to get started > with contributing to this year's contest. We have one month left to get > over a hundred tasks > uploaded, so that is our current priority! > > The PostgreSQL wiki link has the current list of tasks available as well > as a full list of other > mentors in the program for this year, found here: > https://wiki.postgresql.org/wiki/GCI_2018 > > Thank you very much for your interest - we're thrilled to have you be a > part of GCI 2018! > > Sarah > > On Sat, Sep 22, 2018 at 12:39 PM Tahir Ramzan < > tahirram...@alumni.vu.edu.pk> wrote: > >> Honorable Concern, >> >> I want to join GCI as a mentor, please guide me about the procedure, >> thanks in anticipation. >> >> -- >> Regards >> Tahir Ramzan >> MSCS Research Scholar >> Google Summer of Code 2015 (CiviCRM) >> Google Summer of Code 2016 (ModSecurity) >> Outside Collaborator of SpiderLabs (Powered by TrustWave) >> Google Android Students Club Facilitator and Organizer 2015 >> >> Contact: >> >> +92-312-5518018 <+92%20312%205518018> >> >> tahirram...@alumni.vu.edu.pk >> >> >> More details about me and my work: >> >> GitHub Profile: https://github.com/tahirramzan >> >> LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan >> > > > -- > Sarah Conway Schnurr > -- Regards Tahir Ramzan MSCS Research Scholar Google Summer of Code 2015 (CiviCRM) Google Summer of Code 2016 (ModSecurity) Outside Collaborator of SpiderLabs (Powered by TrustWave) Google Android Students Club Facilitator and Organizer 2015 Contact: +92-312-5518018 <+92%20312%205518018> tahirram...@alumni.vu.edu.pk More details about me and my work: GitHub Profile: https://github.com/tahirramzan LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan
Re: Explain is slow with tables having many columns
Hi, (CCing -hackers) On 2018-09-24 12:22:28 -0700, legrand legrand wrote: > I have found that explain on tables with many (hundreds) columns > are slow compare to nominal executions. Yea, colname_is_unique() (called via make_colname_unique()) is essentially O(#total_columns) and rougly called once for each column in a select list (or using or ...). IIRC we've hit this once when I was at citus, too. We really should be usign a more appropriate datastructure here - very likely a hashtable. Unfortunately such a change would likely be a bit too much to backpatch... Greetings, Andres Freund
Re: Missing const in DSA.
On Tue, Sep 25, 2018 at 4:17 AM Tom Lane wrote: > Thomas Munro writes: > > On Mon, Sep 24, 2018 at 9:32 AM Tom Lane wrote: > >> Mark G writes: > >>> While looking at some of the recent churn in DSA I noticed that > >>> dsa_size_class_map should probably be declared const. > > >> +1 ... also, given the contents of the array, "char" seems like > >> rather a misnomer. I'd be happier if it were declared as uint8, say. > > > +1 > > Are you planning to take care of this? Will do. -- Thomas Munro http://www.enterprisedb.com
Re: Participate in GCI as a Mentor
Tahir, I've extended an invitation through GCI for you to become a mentor for the PostgreSQL organization, and have sent an additional email containing information on how to get started with contributing to this year's contest. We have one month left to get over a hundred tasks uploaded, so that is our current priority! The PostgreSQL wiki link has the current list of tasks available as well as a full list of other mentors in the program for this year, found here: https://wiki.postgresql.org/wiki/GCI_2018 Thank you very much for your interest - we're thrilled to have you be a part of GCI 2018! Sarah On Sat, Sep 22, 2018 at 12:39 PM Tahir Ramzan wrote: > Honorable Concern, > > I want to join GCI as a mentor, please guide me about the procedure, > thanks in anticipation. > > -- > Regards > Tahir Ramzan > MSCS Research Scholar > Google Summer of Code 2015 (CiviCRM) > Google Summer of Code 2016 (ModSecurity) > Outside Collaborator of SpiderLabs (Powered by TrustWave) > Google Android Students Club Facilitator and Organizer 2015 > > Contact: > > +92-312-5518018 <+92%20312%205518018> > > tahirram...@alumni.vu.edu.pk > > > More details about me and my work: > > GitHub Profile: https://github.com/tahirramzan > > LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan > -- Sarah Conway Schnurr
auto_explain: Include JIT output if applicable
Hi, Whilst playing around with auto_explain and JIT today, I realized that auto_explain currently doesn't output JIT information, which is rather unfortunate when analyzing a larger set of queries in a semi-automated manner. Attached a trivial patch that fixes the issue and adds JIT information to auto_explain with the same logic as used for regular EXPLAIN. Thanks, Lukas -- Lukas Fittl auto_explain-include-jit-output-v1.patch Description: Binary data
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
On 2018-09-24 18:25:46 +, Nasby, Jim wrote: > > > On Sep 21, 2018, at 12:43 PM, Andres Freund wrote: > > > >> But as far i can see it is possible have aggressive non-wraparound vacuum. > >> One important difference - regular and aggressive regular can be canceled > >> by backend,.wraparound autovacuum can not. (by checking > >> PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c ) > > > > Yes, without checking the code, they should be different. Aggressive is > > controlled by vacuum_freeze_table_age whereas anti-wrap is controlled by > > autovacuum_freeze_max_age (but also implies aggressive). > > Right, except that by the time you get into the vacuum code itself nothing > should really care about that difference. AFAICT, the only thing > is_wraparound is being used for is to set MyPgXact->vacuumFlags |= > PROC_VACUUM_FOR_WRAPAROUND, which prevents the deadlock detector from killing > an autovac process that’s trying to prevent a wraparound. I think it’d be > clearer to remove is_wraparound and move the check from vacuum_rel() into > lazy_vacuum_rel() (which is where the limits for HeapTupleSatisfiesVacuum get > determined). Something like the attached. I'm very doubtful this is an improvement. Especially with the upcoming pluggable storage work making vacuumlazy.c heap specific, while vacuum.c stays generic. The concept of something like PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much (even if criteria for it might). Greetings, Andres Freund
Re: Query is over 2x slower with jit=on
Hi, On 2018-09-19 20:39:22 -0700, Andres Freund wrote: > On 2018-09-19 23:26:52 -0400, Tom Lane wrote: > > That's going in the right direction. Personally I'd make the last line > > more like > > > > Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, > > emission 14.607 ms, total 43.4 ms > > Yea, that's probably easier to read. I'm wondering about upper-casing the individual times (and options) - we're largely upper-casing properties, and for json/xml output each would still be a property. Seems a tad bit more consistent. I now have: FORMAT text: JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.298 ms, Inlining 2.250 ms, Optimization 5.797 ms, Emission 5.246 ms, Total 13.591 ms FORMAT xml: 2 true true true true 0.651 2.260 14.752 7.764 25.427 FORMAT json: "JIT": { "Functions": 2, "Options": { "Inlining": true, "Optimization": true, "Expressions": true, "Deforming": true }, "Timing": { "Generation": 0.238, "Inlining": 0.807, "Optimization": 4.661, "Emission": 4.236, "Total": 9.942 } }, > > > (total at the end seems more natural to me, YMMV). > > I kind of think doing it first is best, because that's usually the first > thing one wants to know. > > > > Also, the "options" format you suggest here seems a bit too biased > > towards binary on/off options --- what happens when there's a > > three-way option? So maybe that line should be like > > > > Options: inlining on, optimization on > > > > though I'm less sure about that part. Now that space is less of a concern, I added expressions, and deforming as additional options - seems reasonable to have all PGJIT_* options imo. Btw, I chose true/false rather than on/off, to be consistent with ExplainPropertyBool - but I've no strong feelings about it. Greetings, Andres Freund
Re: "could not reattach to shared memory" on buildfarm member dory
Noah Misch writes: > On Tue, May 01, 2018 at 11:31:50AM -0400, Tom Lane wrote: >> Well, at this point the only thing that's entirely clear is that none >> of the ideas I had work. I think we are going to be forced to pursue >> Noah's idea of doing an end-to-end retry. Somebody else will need to >> take point on that; I lack a Windows environment and have already done >> a lot more blind patch-pushing than I like in this effort. > Having tried this, I find a choice between performance and complexity. Both > of my designs use proc_exit(4) to indicate failure to reattach. The simpler, > slower design has WIN32 internal_forkexec() block until the child reports (via > SetEvent()) that it reattached to shared memory. This caused a fivefold > reduction in process creation performance[1]. Ouch. > The less-simple, faster design > stashes the Port structure and retry count in the BackendList entry, which > reaper() uses to retry the fork upon seeing status 4. Notably, this requires > new code for regular backends, for bgworkers, and for others. Messy as that is, I think actually the worse problem with it is: > In this proof of concept, the > postmaster does not close its copy of a backend socket until the backend > exits. That seems unworkable because it would interfere with detection of client connection drops. But since you say this is just a POC, maybe you intended to fix that? It'd probably be all right for the postmaster to hold onto the socket until the new backend reports successful attach, using the same signaling mechanism you had in mind for the other way. Overall, I agree that neither of these approaches are exactly attractive. We're paying a heck of a lot of performance or complexity to solve a problem that shouldn't even be there, and that we don't understand well. In particular, the theory that some privileged code is injecting a thread into every new process doesn't square with my results at https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us I think our best course of action at this point is to do nothing until we have a clearer understanding of what's actually happening on dory. Perhaps such understanding will yield an idea for a less painful fix. regards, tom lane
Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
On Wed, Sep 19, 2018 at 11:23 AM Peter Geoghegan wrote: > 3 modes > --- > > My new approach is to teach _bt_findsplitloc() 3 distinct modes of > operation: Regular/default mode, many duplicates mode, and single > value mode. I think that I'll have to add a fourth mode, since I came up with another strategy that is really effective though totally complementary to the other 3 -- "multiple insertion point" mode. Credit goes to Kevin Grittner for pointing out that this technique exists about 2 years ago [1]. The general idea is to pick a split point just after the insertion point of the new item (the incoming tuple that prompted a page split) when it looks like there are localized monotonically increasing ranges. This is like a rightmost 90:10 page split, except the insertion point is not at the rightmost page on the level -- it's rightmost within some local grouping of values. This makes the two largest TPC-C indexes *much* smaller. Previously, they were shrunk by a little over 5% by using the new generic strategy, a win that now seems like small potatoes. With this new mode, TPC-C's order_line primary key, which is the largest index of all, is ~45% smaller following a standard initial bulk load at scalefactor 50. It shrinks from 99,085 blocks (774.10 MiB) to 55,020 blocks (429.84 MiB). It's actually slightly smaller than it would be after a fresh REINDEX with the new strategy. We see almost as big a win with the second largest TPC-C index, the stock table's primary key -- it's ~40% smaller. Here is the definition of the biggest index, the order line primary key index: pg@tpcc[3666]=# \d order_line_pkey Index "public.order_line_pkey" Column │ Type │ Key? │ Definition ───┼─┼──┼ ol_w_id │ integer │ yes │ ol_w_id ol_d_id │ integer │ yes │ ol_d_id ol_o_id │ integer │ yes │ ol_o_id ol_number │ integer │ yes │ ol_number primary key, btree, for table "public.order_line" The new strategy/mode works very well because we see monotonically increasing inserts on ol_number (an order's item number), but those are grouped by order. It's kind of an adversarial case for our existing implementation, and yet it seems like it's probably a fairly common scenario in the real world. Obviously these are very significant improvements. They really exceed my initial expectations for the patch. TPC-C is generally considered to be by far the most influential database benchmark of all time, and this is something that we need to pay more attention to. My sense is that the TPC-C benchmark is deliberately designed to almost require that the system under test have this "multiple insertion point" B-Tree optimization, suffix truncation, etc. This is exactly the same index that we've seen reports of out of control bloat on when people run TPC-C over hours or days [2]. My next task is to find heuristics to make the new page split mode/strategy kick in when it's likely to help, but not kick in when it isn't (when we want something close to a generic 50:50 page split). These heuristics should look similar to what I've already done to get cases with lots of duplicates to behave sensibly. Anyone have any ideas on how to do this? I might end up inferring a "multiple insertion point" case from the fact that there are multiple pass-by-value attributes for the index, with the new/incoming tuple having distinct-to-immediate-left-tuple attribute values for the last column, but not the first few. It also occurs to me to consider the fragmentation of the page as a guide, though I'm less sure about that. I'll probably need to experiment with a variety of datasets before I settle on something that looks good. Forcing the new strategy without considering any of this actually works surprisingly well on cases where you'd think it wouldn't, since a 50:50 page split is already something of a guess about where future insertions will end up. [1] https://postgr.es/m/CACjxUsN5fV0kV=yirxwa0s7lqoojuy7soptipdhucemhgwo...@mail.gmail.com [2] https://www.commandprompt.com/blog/postgres_autovacuum_bloat_tpc-c/ -- Peter Geoghegan
Re: Allowing printf("%m") only where it actually works
Michael Paquier writes: > On Wed, Sep 12, 2018 at 01:40:01PM -0400, Tom Lane wrote: >> Rebase attached --- no substantive changes. > - if (handleDLL == NULL) > - ereport(FATAL, > - (errmsg_internal("could not load netmsg.dll: error > -code %lu", GetLastError(; > In 0001, this is replaced by a non-FATAL error for the backend, which > does not seem like a good idea to me because the user loses visibility > with this DDL which canot be loaded. I still have to see this error... Well, we have to change the code somehow to make it usable in frontend as well as backend. And we can *not* have it do exit(1) in libpq. So the solution I chose was to make it act the same as if FormatMessage were to fail. I don't find this behavior unreasonable: what is really important is the original error code, not whether we were able to pretty-print it. I think the ereport(FATAL) coding is a pretty darn bad idea even in the backend. > Could you drop the configure checks for snprintf and vsnprintf in a > separate patch? The cleanup of %m and the removal of those switches > should be treated separatly in my opinion. Seems a bit make-worky, but here you go. 0001 is the same as before (but rebased up to today, so some line numbers change). 0002 changes things so that we always use our snprintf, removing all the configure logic associated with that. 0003 implements %m in snprintf.c and adjusts our various printf-wrapper functions to ensure that they pass errno through reliably. 0004 changes elog.c to rely on %m being implemented below it. regards, tom lane diff --git a/configure b/configure index 9b30402..afbc142 100755 *** a/configure --- b/configure *** esac *** 15635,15653 fi - ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror" - if test "x$ac_cv_func_strerror" = xyes; then : - $as_echo "#define HAVE_STRERROR 1" >>confdefs.h - - else - case " $LIBOBJS " in - *" strerror.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS strerror.$ac_objext" - ;; - esac - - fi - ac_fn_c_check_func "$LINENO" "strlcat" "ac_cv_func_strlcat" if test "x$ac_cv_func_strlcat" = xyes; then : $as_echo "#define HAVE_STRLCAT 1" >>confdefs.h --- 15635,15640 diff --git a/configure.in b/configure.in index 2e60a89..6973b77 100644 *** a/configure.in --- b/configure.in *** else *** 1678,1684 AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break]) fi ! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strerror strlcat strlcpy strnlen]) case $host_os in --- 1678,1684 AC_CHECK_FUNCS([fpclass fp_class fp_class_d class], [break]) fi ! AC_REPLACE_FUNCS([crypt dlopen fls getopt getrusage inet_aton mkdtemp random rint srandom strlcat strlcpy strnlen]) case $host_os in diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c index f4356fe..af35cfb 100644 *** a/src/backend/port/win32/socket.c --- b/src/backend/port/win32/socket.c *** pgwin32_select(int nfds, fd_set *readfds *** 690,728 memcpy(writefds, &outwritefds, sizeof(fd_set)); return nummatches; } - - - /* - * Return win32 error string, since strerror can't - * handle winsock codes - */ - static char wserrbuf[256]; - const char * - pgwin32_socket_strerror(int err) - { - static HANDLE handleDLL = INVALID_HANDLE_VALUE; - - if (handleDLL == INVALID_HANDLE_VALUE) - { - handleDLL = LoadLibraryEx("netmsg.dll", NULL, DONT_RESOLVE_DLL_REFERENCES | LOAD_LIBRARY_AS_DATAFILE); - if (handleDLL == NULL) - ereport(FATAL, - (errmsg_internal("could not load netmsg.dll: error code %lu", GetLastError(; - } - - ZeroMemory(&wserrbuf, sizeof(wserrbuf)); - if (FormatMessage(FORMAT_MESSAGE_IGNORE_INSERTS | - FORMAT_MESSAGE_FROM_SYSTEM | - FORMAT_MESSAGE_FROM_HMODULE, - handleDLL, - err, - MAKELANGID(LANG_ENGLISH, SUBLANG_DEFAULT), - wserrbuf, - sizeof(wserrbuf) - 1, - NULL) == 0) - { - /* Failed to get id */ - sprintf(wserrbuf, "unrecognized winsock error %d", err); - } - return wserrbuf; - } --- 690,692 diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 16531f7..22e5d87 100644 *** a/src/backend/utils/error/elog.c --- b/src/backend/utils/error/elog.c *** static void send_message_to_server_log(E *** 178,185 static void write_pipe_chunks(char *data, int len, int dest); static void send_message_to_frontend(ErrorData *edata); static char *expand_fmt_string(const char *fmt, ErrorData *edata); - static const char *useful_strerror(int errnum); - static const char *get_errno_symbol(int errnum); static const char *error_severity(int elevel); static void append_with_tabs(StringInfo buf, const char *str); static bool is_log_level_output(int elevel, int log_min_level); --- 178,183 *** expand_fmt_st
Re: Proposal for Signal Detection Refactoring
On 2018-09-24 11:45:10 +0200, Chris Travers wrote: > I did some more reading. > > On Mon, Sep 24, 2018 at 10:15 AM Chris Travers > wrote: > > > First, thanks for taking the time to write this. Its very helpful. > > Additional thoughts inline. > > > > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier > > wrote: > > > >> > >> There could be value in refactoring things so as all the *Pending flags > >> of miscadmin.h get stored into one single volatile sig_atomic_t which > >> uses bit-wise markers, as that's at least 4 bytes because that's stored > >> as an int for most platforms and can be performed as an atomic operation > >> safely across signals (If my memory is right;) ). And this leaves a lot > >> of room for future flags. > >> > > > > Yeah I will look into this. > > > > > Ok so having looked into this a bit more > > It looks like to be strictly conforming, you can't just use a series of > flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write > round-trip safe in signal handlers. In other words, if you read, are > pre-empted by another signal, and then write, you may clobber what the > other signal handler did to the variable. So you need atomics, which are > C11 > > What I would suggest instead at least for an initial approach is: > > 1. A struct of volatile bools statically stored > 2. macros for accessing/setting/clearing flags > 3. Consistent use of these macros throughout the codebase. > > To make your solution work it looks like we'd need C11 atomics which would > be nice and maybe at some point we decide to allow newer feature, or we > could wrap this itself in checks for C11 features and provide atomic flags > in the future. It seems that the above solution would strictly comply to > C89 and pose no concurrency issues. It's certainly *NOT* ok to use atomics in signal handlers indiscriminately, on some hardware / configure option combinations they're backed by spinlocks (or even semaphores) and thus *NOT* interrupt save. This doesn't seem to solve an actual problem, why are we discussing changing this? What'd be measurably improved, worth the cost of making backpatching more painful? Greetings, Andres Freund
Re: [HACKERS] [PATCH v2] Add and report the new "session_read_only" GUC pseudo-variable.
On Thursday, March 1, 2018 4:25:16 AM EDT Andres Freund wrote: > A CF entry for this patch has been created yesterday, without any > changes having happend since the above status update. Given that > there's been no progress for multiple commitfests, and this is the > last CF, this doesn't seem appropriate. Therefore marked as returned > with feedback. Sorry for the long silence on this. I'm attaching a rebased and cleaned-up patch with the earlier review issues addressed. I've checked all relevant tests and verified manually. Elvis>From b4b23556552d1fddd3a58ec4db43cbcc62230a87 Mon Sep 17 00:00:00 2001 From: Elvis Pranskevichus Date: Tue, 12 Sep 2017 11:15:01 -0400 Subject: [PATCH v2] Add and report the new "session_read_only" GUC pseudo-variable. Currently, clients wishing to know whether the server is in hot standby have to resort to polling, which is often suboptimal. This adds the new "session_read_only" GUC variable that is reported via a ParameterStatus message. This allows the clients to: (a) know right away that they are connected to a server in hot standby; and (b) know immediately when a server exits hot standby. This is immediately useful for target_session_attrs=read-write, as libpq will not need to make a `SHOW transaction_read_only` roundtrip. It is also beneficial to various connection pooling systems (pgpool, pgbouncer etc.) This also adds support for "target_session_attrs=read-only". Parts of the patch are based on the work by Tsunakawa Takayuki. --- doc/src/sgml/libpq.sgml | 4 ++ doc/src/sgml/protocol.sgml | 3 +- src/backend/access/transam/xlog.c| 9 +++- src/backend/commands/async.c | 49 ++ src/backend/storage/ipc/procarray.c | 30 +++ src/backend/storage/ipc/procsignal.c | 3 ++ src/backend/storage/ipc/standby.c| 9 src/backend/tcop/postgres.c | 14 - src/backend/utils/misc/check_guc | 2 +- src/backend/utils/misc/guc.c | 42 ++- src/include/commands/async.h | 7 +++ src/include/storage/procarray.h | 2 + src/include/storage/procsignal.h | 2 + src/include/storage/standby.h| 1 + src/interfaces/libpq/fe-connect.c| 76 src/interfaces/libpq/fe-exec.c | 6 ++- src/interfaces/libpq/libpq-int.h | 3 +- 17 files changed, 243 insertions(+), 19 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 06d909e804..0cd55bff97 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1579,6 +1579,9 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname If this parameter is set to read-write, only a connection in which read-write transactions are accepted by default +is considered acceptable. +If this parameter is set to read-only, only a +connection in which read-write transactions are rejected by default is considered acceptable. The query SHOW transaction_read_only will be sent upon any successful connection; if it returns on, the connection @@ -1929,6 +1932,7 @@ const char *PQparameterStatus(const PGconn *conn, const char *paramName); before 8.1; IntervalStyle was not reported by releases before 8.4; application_name was not reported by releases before 9.0.) + session_read_only was not reported by releases before 12.0.) Note that server_version, server_encoding and diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index f0b2145208..ab8db9dc89 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1291,6 +1291,7 @@ SELCT 1/0; before 8.1; IntervalStyle was not reported by releases before 8.4; application_name was not reported by releases before 9.0.) +session_read_only was not reported by releases before 12.0.) Note that server_version, server_encoding and @@ -1586,7 +1587,7 @@ PostgreSQL is tls-server-end-point. user-supplied password in the transmitted password hash. While this prevents the password hash from being successfully retransmitted in a later session, it does not prevent a fake server between the real - server and client from passing through the server's random value + server and client from passing through the server's random value and successfully authenticating. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4754e75436..8d29fd6aa1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7904,8 +7904,10 @@ StartupXLOG(void) * Shutdown the recovery environment. This must occur after * RecoverPreparedTransactions(), see notes for lock_twophase_recover() */ - if (standbyState != STANDBY_DISABLED) + if (standbyState != STANDBY_DISABLED) { ShutdownRecoveryTransactionEnvironment(); + SendHotSta
Re: Collation versioning
It's been a bunch of years since I worked with ICU, so anything I say below may have changed in their code or be subject to mental bit rot. On Sun, Sep 23, 2018 at 2:48 PM Thomas Munro wrote: > Considering that to handle this we'd need to figure out > how link libicu.so.55, libicu.so.56, ... etc into the same backend, > and yet they presumably have the same collation names, There's an option when compiling ICU to version extend the API names. So, instead of calling ucol_open(), you'd call ucol_open_55() or ucol_open_56() then you can link to both libixu.so.55 and libicu.so.56 without getting name collisions. The side effect of this is that it's the application's responsibility to figure out which version of ICU "en_US" should be routed to. In DB2, we started the collation names with UCAxxx (later changed to CLDRxxx) to let us distinguish which version of the API to call. Admittedly that creates a whole can > of worms for initdb-time catalog creation, package maintainers' jobs, > how long old versions have to be supported and how you upgraded > database objects to new ICU versions. Yep. We never come up with a good answer for that before I left IBM. At the time, DB2 only supported 2 or 3 version of ICU, so they were all shipped as part of the install bundle. Long term, I think the only viable approach to supporting multiple versions of ICU is runtime loading of the libraries. Then it's up to the system administrator to make sure the necessary versions are installed on the system. Yeah, it seems like ICU is *also* subject to minor changes that happen > under your feet, much like libc. For example maintenance release 60.2 > (you can't install that at the same time as 60.1, but you can install > it at the same time as 59.2). You'd be linked against libicu.so.60 > (and thence libicudata.so.60), and it gets upgraded in place when you > run the local equivalent of apt-get upgrade. > This always worried me because an unexpected collation change is so painful for a database. And I was never able to think of a way of reliably testing compatibility either because of ICU's ability to reorder and group characters when collating.
Re: Missing const in DSA.
Thomas Munro writes: > On Mon, Sep 24, 2018 at 9:32 AM Tom Lane wrote: >> Mark G writes: >>> While looking at some of the recent churn in DSA I noticed that >>> dsa_size_class_map should probably be declared const. >> +1 ... also, given the contents of the array, "char" seems like >> rather a misnomer. I'd be happier if it were declared as uint8, say. > +1 Are you planning to take care of this? regards, tom lane
Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role
Michael Paquier writes: > This should be back-patched. Any opinions about bumping up this > extension version in back-branches like what has been done in 53b79ab4? Yes, you need to bump the extension version to change anything in the extension's script file. For v10 and up, the method used in 53b79ab4 is overcomplicated: you only need to add a delta script not a new base script. (If you had to back-patch before v10, it might be best to add a new base script in all the branches just to keep the patches consistent; but IIUC this issue only arises in v10 and up.) I'd consider following, eg, 7f563c09f as a prototype instead. regards, tom lane
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On 2018-Sep-20, Marco Slot wrote: > We're seeing a segmentation fault when creating a partition of a > partitioned table with a primary key when there is a sql_drop trigger on > Postgres 11beta4. > > We discovered it because the Citus extension creates a sql_drop trigger, > but it's otherwise unrelated to the Citus extension: > https://github.com/citusdata/citus/issues/2390 Thanks for the reproducer. Will research. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Multiple primary key on partition table?
[Back from a week AFK ...] On 2018-Sep-18, amul sul wrote: > On Mon, Sep 17, 2018 at 9:06 PM amul sul wrote: > > > > Nice catch Rajkumar. > Here is the complete patch proposes the aforesaid fix with regression test. Looks closely related to the one that was fixed in 1f8a3327a9db, but of course it's a different code path. Will review this shortly, thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Revoke execution permission of pg_stat_statements_reset() from pg_read_all_stats role
On Mon, Sep 24, 2018 at 11:08:14AM +1000, Haribabu Kommi wrote: > In commit 25fff40798 the execute permission of pg_stat_statements_reset() > is provided to pg_read_all_stats role in [1]. > > The execute permissions grant to pg_read_all_stats concern is raised in [2] > during the discussion of supporting different methods of reset the stats, > instead of resetting all. > > Here I attached the patch that reverts the permission grant as per the > discussion > in [3]. This should be back-patched. Any opinions about bumping up this extension version in back-branches like what has been done in 53b79ab4? -- Michael signature.asc Description: PGP signature
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Mon, Sep 24, 2018 at 12:19:44PM +1000, Haribabu Kommi wrote: > Attached new rebased version of the patch that enhances the > pg_stat_statements_reset() > function. This needs to be applied on top of the patch that is posted in > [1]. +CREATE ROLE stats_regress_user1; +CREATE ROLE stats_regress_user2; Just a short note: regression tests creating roles should use regress_ as prefix. -- Michael signature.asc Description: PGP signature
Re: [patch]overallocate memory for curly braces in array_out
Keiichi Hirobe writes: > Attached is a patch that fixes a bug > for miscounting total number of curly braces in output string in array_out. Wow, good catch! Testing this, I found there's a second way in which the space calculation is off: it always allocated one more byte than required, as a result of counting one more comma than is really required. That's not nearly as significant as the curly-brace miscount, but it still got in the way of doing this: *** 1234,1239 --- 1243,1251 #undef APPENDSTR #undef APPENDCHAR + /* Assert that we calculated the string length accurately */ + Assert(overall_length == (p - retval + 1)); + pfree(values); pfree(needquotes); which seemed to me like a good idea now that we know this code isn't so perfect as all that. Will push shortly. regards, tom lane
Re: doc - add missing documentation for "acldefault"
On 09/24/2018 10:09 AM, Joe Conway wrote: > On 09/24/2018 10:01 AM, Tom Lane wrote: >> Joe Conway writes: >>> Having seen none, committed/pushed. This did not seem worth >>> back-patching, so I only pushed to master. >> >> I don't see anything on gitmaster? > > Hmm, yes, interesting -- I must of messed up my local git repo somehow. > Will try again. This time it seems to have worked. Sorry for the noise earlier :-/ Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: doc - add missing documentation for "acldefault"
On 09/24/2018 10:01 AM, Tom Lane wrote: > Joe Conway writes: >> Having seen none, committed/pushed. This did not seem worth >> back-patching, so I only pushed to master. > > I don't see anything on gitmaster? Hmm, yes, interesting -- I must of messed up my local git repo somehow. Will try again. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: Implementing SQL ASSERTION
On 29/04/2018 20:18, Joe Wildish wrote: > On 28 Mar 2018, at 16:13, David Fetter wrote: >> >> Sorry to bother you again, but this now doesn't compile atop master. > > Attached is a rebased patch for the prototype. I took a look at this. This has been lying around for a few months, so it will need to be rebased again. I applied this patch on top of 68e7e973d22274a089ce95200b3782f514f6d2f8, which was the HEAD around the time this patch was created, and it applies cleanly there. Please check you patch for whitespace errors: warning: squelched 13 whitespace errors warning: 18 lines add whitespace errors. Also, reduce the amount of useless whitespace changes in the patch. There are some compiler warnings: constraint.c: In function 'CreateAssertion': constraint.c:1211:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] constraint.c: In function 'oppositeDmlOp': constraint.c:458:1: error: control reaches end of non-void function [-Werror=return-type] The version check in psql's describeAssertions() needs to be updated. Also, you should use formatPGVersionNumber() to cope with two-part and one-part version numbers. All this new code in constraint.c that checks the assertion expression needs more comments and documentation. Stuff like this isn't going to work: static int funcMaskForFuncOid(Oid funcOid) { char *name = get_func_name(funcOid); if (name == NULL) return OTHER_FUNC; else if (strncmp(name, "min", strlen("min")) == 0) return MIN_AGG_FUNC; else if (strncmp(name, "max", strlen("max")) == 0) return MAX_AGG_FUNC; You can assume from the name of a function what it's going to do. Solving this properly might be hard. The regression test crashes for me around frame #4: 0x00010d3a4cdc postgres`castNodeImpl(type=T_SubLink, ptr=0x7ff27006d230) at nodes.h:582 frame #5: 0x00010d3a61c6 postgres`visitSubLink(node=0x7ff270034040, info=0x7ffee2a23930) at constraint.c:843 This ought to be reproducible for you if you build with assertions. My feeling is that if we want to move forward on this topic, we need to solve the concurrency question first. All these optimizations for when we don't need to check the assertion are cool, but they are just optimizations that we can apply later on, once we have solved the critical problems. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: doc - add missing documentation for "acldefault"
Joe Conway writes: > Having seen none, committed/pushed. This did not seem worth > back-patching, so I only pushed to master. I don't see anything on gitmaster? regards, tom lane
Re: Proposal for disk quota feature
> > The quotas or object limits, resource limits are pretty useful and > necessary, but I don't see these like new type of objects, it is much more > some property of current objects. Because we have one syntax for this > purpose I prefer it. Because is not good to have two syntaxes for similar > purpose. SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a special case when using ALTER SET at this moment. TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace both residents in one database. But ROLE is cluster-level. They don't belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the quota for the user on all the databases in the first glance. But in our first stage design, ROLE's quota is bind to a specific database. E.g. Role Jack could have 10GB quota on database A and 2GB quota on database B. SQL syntax is not hard to modify, I don't think this should block the main design of disk quota feature. Is there any comment on the design and architecture? If no, we'll firstly submit our patch and involve more discussion? On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule wrote: > > > so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang napsal: > >> But it looks like redundant to current GUC configuration and limits >> >> what do you mean by current GUC configuration? Is that the general block >> number limit in your patch? If yes, the difference between GUC and >> pg_diskquota catalog is that pg_diskquota will store different quota limit >> for the different role, schema or table instead of a single GUC value. >> > > storage is not relevant in this moment. > > I don't see to consistent to sets some limits via SET command, or ALTER X > SET, and some other with CREATE QUOTA ON. > > The quotas or object limits, resource limits are pretty useful and > necessary, but I don't see these like new type of objects, it is much more > some property of current objects. Because we have one syntax for this > purpose I prefer it. Because is not good to have two syntaxes for similar > purpose. > > So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer > > ALTER SCHEMA xxx SET disc_quota = xxx; > > The functionality is +/- same. But ALTER XX SET was introduce first, and I > don't feel comfortable to have any new syntax for similar purpose > > Regards > > Pavel > > > > > >> >> On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule >> wrote: >> >>> >>> >>> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang >>> napsal: >>> just fast reaction - why QUOTA object? > Isn't ALTER SET enough? > Some like > ALTER TABLE a1 SET quote = 1MB; > ALTER USER ... > ALTER SCHEMA .. > New DDL commans looks like too hard hammer . It's an option. Prefer to consider quota setting store together: CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way. (Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.) >>> >>> I have not a problem with new special table for storing this >>> information. But it looks like redundant to current GUC configuration and >>> limits. Can be messy do some work with ALTER ROLE, and some work via CREATE >>> QUOTE. >>> >>> Regards >>> >>> Pavel >>> >>> Here are some differences I can think of: 1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database. 2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects. On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule wrote: > > > pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang > napsal: > >> >> >> >> >> *Hi all,We redesign disk quota feature based on the comments from >> Pavel Stehule and Chapman Flack. Here are the new >> design.OverviewBasically, >> disk quota feature is used to support multi-tenancy environment, >> different >> level of database objects could be set a quota limit to avoid over use of >> disk space. A common case could be as follows: DBA could enable disk >> quota >> on a specified database list. DBA could set disk quota limit for >> tables/schemas/roles in these databases. Separate disk quota worker >> process >> will monitor the disk usage for these objects and detect the objects >> which >> exceed their quota limit. Queries loading data into these “out of disk >> quota” tables/schemas/roles will be cancelled.We are current
Re: doc - add missing documentation for "acldefault"
On 09/21/2018 01:51 PM, Joe Conway wrote: > On 09/19/2018 11:18 AM, Joe Conway wrote: >> On 09/19/2018 10:54 AM, Tom Lane wrote: >>> So maybe what we really need is a table of operators not functions. >> >> Good idea -- I will take a look at that. >> >>> However, I don't object to documenting any function that has its >>> own pg_description string. > > Ok, so the attached version refactors/splits the group into two tables > -- operators and functions. > I also included John Naylor's patch with some minor editorialization. > > Any further comments or complaints? Having seen none, committed/pushed. This did not seem worth back-patching, so I only pushed to master. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: Proposal for Signal Detection Refactoring
Very minor revision On Mon, Sep 24, 2018 at 11:45 AM Chris Travers wrote: > > Ok so having looked into this a bit more > > It looks like to be strictly conforming, you can't just use a series of > flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write > round-trip safe in signal handlers. In other words, if you read, are > pre-empted by another signal, and then write, you may clobber what the > other signal handler did to the variable. So you need atomics, which are > C11 > > What I would suggest instead at least for an initial approach is: > > 1. A struct of volatile bools statically stored > These would be implemented as sig_atomic_t which is defined in C89 but has no atomic operators other than writing the full value. > 2. macros for accessing/setting/clearing flags > 3. Consistent use of these macros throughout the codebase. > > To make your solution work it looks like we'd need C11 atomics which would > be nice and maybe at some point we decide to allow newer feature, or we > could wrap this itself in checks for C11 features and provide atomic flags > in the future. It seems that the above solution would strictly comply to > C89 and pose no concurrency issues. > > -- >> Best Regards, >> Chris Travers >> Head of Database >> >> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com >> Saarbrücker Straße 37a, 10405 Berlin >> >> > > -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: Make deparsing of column defaults faster
On 30/07/2018 13:51, Jeff Janes wrote: > Any thoughts on how to proceed here? It seems there is more work to do > to cover all the issues with dumping and restoring tables with many > columns. Since the original report was in the context of pg_upgrade, we > should surely address at least the pg_restore slowness. > > I'll working on solving the problem using a hash table at the lowest > level (making column names unique), for a future commit fest. That > should drop it from N^3 to N^2, which since N can't go above 1600 should > be good enough. > > So we can set this to rejected, as that will be an entirely different > approach. > > Your caching patch might be worthwhile on its own, though. I'm going to set this thread as returned with feedback until we have a more complete solution. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Hint to set owner for tablespace directory
On 11/09/2018 17:10, Peter Eisentraut wrote: > On 07/09/2018 17:59, Maksim Milyutin wrote: >> those directories was that user). The error message "could not set >> permissions on directory ..." disoriented that user. The need to change >> the owner of those directories came after careful reading of >> documentation. I think it would be helpful to show the proposed hint to >> more operatively resolve the problem. > > I think it might be worth clarifying the documentation instead. I'm > looking at the CREATE TABLESPACE reference page and it's not super clear > on first reading. How about the attached patch? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From eb0796921b6ebfe2375037d903a081a7b8f45c0b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 24 Sep 2018 14:47:09 +0200 Subject: [PATCH] doc: Clarify CREATE TABLESPACE documentation Be more specific about when and how to create the directory and what permissions it should have. --- doc/src/sgml/ref/create_tablespace.sgml | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/ref/create_tablespace.sgml b/doc/src/sgml/ref/create_tablespace.sgml index 18fa5f0ebf..c621ec2c6b 100644 --- a/doc/src/sgml/ref/create_tablespace.sgml +++ b/doc/src/sgml/ref/create_tablespace.sgml @@ -92,7 +92,8 @@ Parameters The directory that will be used for the tablespace. The directory -should be empty and must be owned by the +must exist (CREATE TABLESPACE will not create it), +should be empty, and must be owned by the PostgreSQL system user. The directory must be specified by an absolute path name. @@ -137,15 +138,23 @@ Notes Examples - Create a tablespace dbspace at /data/dbs: + To create a tablespace dbspace at file system location + /data/dbs, first create the directory using operating + system facilities and set the correct ownership: + +mkdir /data/dbs +chown postgres:postgres /data/dbs + + Then issue the tablespace creation command inside + PostgreSQL: CREATE TABLESPACE dbspace LOCATION '/data/dbs'; - Create a tablespace indexspace at /data/indexes - owned by user genevieve: + To create a tablespace owned by a different database user, use a command + like this: CREATE TABLESPACE indexspace OWNER genevieve LOCATION '/data/indexes'; -- 2.19.0
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On Thu, Sep 20, 2018 at 12:00:18PM +0200, Marco Slot wrote: > We're seeing a segmentation fault when creating a partition of a > partitioned table with a primary key when there is a sql_drop trigger on > Postgres 11beta4. Thanks for reporting ; I reproduced easily so added to open items list, since indices on partitioned talbes is a feature new in PG11. Core was generated by `postgres: pryzbyj ts [local] CREATE TABLE '. Program terminated with signal 11, Segmentation fault. #0 0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at event_trigger.c:1745 1745event_trigger.c: No such file or directory. in event_trigger.c (gdb) bt #0 0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at event_trigger.c:1745 #1 0x005dfbd3 in AlterTableInternal (relid=40108800, cmds=0x21c39a8, recurse=true) at tablecmds.c:3328 #2 0x005b5b7b in DefineIndex (relationId=40108800, stmt=0x21a7350, indexRelationId=0, parentIndexId=40084714, parentConstraintId=40084715, is_alter_table=false, check_rights=false, check_not_in_use=false, skip_build=false, quiet=false) at indexcmds.c:669 #3 0x005dcfee in DefineRelation (stmt=0x2116690, relkind=114 'r', ownerId=17609, typaddress=0x0, queryString=0x20f1e60 "CREATE TABLE collections_list_1\nPARTITION OF collections_list (key, ts, collection_id, value)\nFOR VALUES IN (1);") at tablecmds.c:946 [...] Justin
Re: Proposal for Signal Detection Refactoring
I did some more reading. On Mon, Sep 24, 2018 at 10:15 AM Chris Travers wrote: > First, thanks for taking the time to write this. Its very helpful. > Additional thoughts inline. > > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier > wrote: > >> >> There could be value in refactoring things so as all the *Pending flags >> of miscadmin.h get stored into one single volatile sig_atomic_t which >> uses bit-wise markers, as that's at least 4 bytes because that's stored >> as an int for most platforms and can be performed as an atomic operation >> safely across signals (If my memory is right;) ). And this leaves a lot >> of room for future flags. >> > > Yeah I will look into this. > Ok so having looked into this a bit more It looks like to be strictly conforming, you can't just use a series of flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write round-trip safe in signal handlers. In other words, if you read, are pre-empted by another signal, and then write, you may clobber what the other signal handler did to the variable. So you need atomics, which are C11 What I would suggest instead at least for an initial approach is: 1. A struct of volatile bools statically stored 2. macros for accessing/setting/clearing flags 3. Consistent use of these macros throughout the codebase. To make your solution work it looks like we'd need C11 atomics which would be nice and maybe at some point we decide to allow newer feature, or we could wrap this itself in checks for C11 features and provide atomic flags in the future. It seems that the above solution would strictly comply to C89 and pose no concurrency issues. -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: Pluggable Storage - Andres's take
On Mon, Sep 24, 2018 at 8:04 AM Haribabu Kommi wrote: > On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov > wrote: >> >> On Fri, Aug 24, 2018 at 5:50 AM Andres Freund wrote: >> > I've pushed a current version of that to my git tree to the >> > pluggable-storage branch. It's not really a version that I think makese >> > sense to review or such, but it's probably more useful if you work based >> > on that. There's also the pluggable-zheap branch, which I found >> > extremely useful to develop against. >> >> BTW, I'm going to take a look at current shape of this patch and share >> my thoughts. But where are the branches you're referring? On your >> postgres.org git repository pluggable-storage brach was updates last >> time at June 7. And on the github branches are updated at August 5 >> and 14, and that is still much older than your email (August 24)... > > > The code is latest, but the commit time is older, I feel that is because of > commit squash. > > pluggable-storage is the branch where the pluggable storage code is present > and pluggable-zheap branch where zheap is rebased on top of pluggable > storage. Got it, thanks! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
Hi! I want to review this patch set. Though I understand that it probably will be quite long process. I like the idea that with this patch set universally all postgres instances are bound into single distributed DB, even if they never heard about each other before :) This is just amazing. Or do I get something wrong? I've got few questions: 1. If we coordinate HA-clusters with replicas, can replicas participate if their part of transaction is read-only? 2. How does InDoubt transaction behave when we add or subtract leap seconds? Also, I could not understand some notes from Arseny: > 25 июля 2018 г., в 16:35, Arseny Sher написал(а): > > * One drawback of these patches is that only REPEATABLE READ is > supported. For READ COMMITTED, we must export every new snapshot > generated on coordinator to all nodes, which is fairly easy to > do. SERIALIZABLE will definitely require chattering between nodes, > but that's much less demanded isolevel (e.g. we still don't support > it on replicas). If all shards are executing transaction in SERIALIZABLE, what anomalies does it permit? If you have transactions on server A and server B, there are transactions 1 and 2, transaction A1 is serialized before A2, but B1 is after B2, right? Maybe we can somehow abort 1 or 2? > > * Another somewhat serious issue is that there is a risk of recency > guarantee violation. If client starts transaction at node with > lagging clocks, its snapshot might not include some recently > committed transactions; if client works with different nodes, she > might not even see her own changes. CockroachDB describes at [1] how > they and Google Spanner overcome this problem. In short, both set > hard limit on maximum allowed clock skew. Spanner uses atomic > clocks, so this skew is small and they just wait it at the end of > each transaction before acknowledging the client. In CockroachDB, if > tuple is not visible but we are unsure whether it is truly invisible > or it's just the skew (the difference between snapshot and tuple's > csn is less than the skew), transaction is restarted with advanced > snapshot. This process is not infinite because the upper border > (initial snapshot + max skew) stays the same; this is correct as we > just want to ensure that our xact sees all the committed ones before > it started. We can implement the same thing. I think that this situation is also covered in Clock-SI since transactions will not exit InDoubt state before we can see them. But I'm not sure, chances are that I get something wrong, I'll think more about it. I'd be happy to hear comments from Stas about this. > > > * 003_bank_shared.pl test is removed. In current shape (loading one > node) it is useless, and if we bombard both nodes, deadlock surely > appears. In general, global snaphots are not needed for such > multimaster-like setup -- either there are no conflicts and we are > fine, or there is a conflict, in which case we get a deadlock. Can we do something with this deadlock? Will placing an upper limit on time of InDoubt state fix the issue? I understand that aborting automatically is kind of dangerous... Also, currently hanging 2pc transaction can cause a lot of headache for DBA. Can we have some kind of protection for the case when one node is gone permanently during transaction? Thanks! Best regards, Andrey Borodin.
Re: Proposal for Signal Detection Refactoring
First, thanks for taking the time to write this. Its very helpful. Additional thoughts inline. On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier wrote: > On Fri, Sep 21, 2018 at 12:35:46PM +0200, Chris Travers wrote: > > I understand how lock levels don't fit a simple hierarchy but at least > > when it comes to what is going to be aborted on a signal, I am having > > trouble understanding the problem here. > > It may be possible to come with a clear hierarchy with the current > interruption types in place. Still I am not sure that the definition > you put behind is completely correct, and I think that we need to > question as well the value of putting such restrictions for future > interruption types because they would need to fit into it. The future-safety issue is a really good one and it's one reason I kept the infinite loop patch as semantically consistent with the API as I could at the cost of some complexity. I have another area where I think a patch would be more valuable anyway in terms of refactoring. > That's quite > a heavy constraint to live with. There is such logic with wal_level for > example, which is something I am not completely happy with either... > But this one is a story for another time, and another thread. > From a cleanup perspective a concentric circles approach seems like it is correct to me (which would correspond to a hierarchy of interrupts) but I can see that assuming that all pending interrupts would be checked solely for cleanup reasons might be a bad assumption on my part. > > Regarding your patch, it seems to me that it does not improve > readability as I mentioned up-thread because you lose sight of what can > be interrupted in a given code path, which is what the current code > shows actually nicely. > So I guess there are two fundamental questions here. 1. Do we want to move away from checking global flags like this directly? I think we do because it makes future changes possibly harder and more complex since there is no encapsulation of logic. But I don't see a point in putting effort into that without consensus. > > There could be value in refactoring things so as all the *Pending flags > of miscadmin.h get stored into one single volatile sig_atomic_t which > uses bit-wise markers, as that's at least 4 bytes because that's stored > as an int for most platforms and can be performed as an atomic operation > safely across signals (If my memory is right;) ). And this leaves a lot > of room for future flags. > Yeah I will look into this. Thanks again for taking the time to go over the concerns in detail. It really helps. Best Wishes, Chris Travers > -- > Michael > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin