Re: chained transactions

2018-12-26 Thread Fabien COELHO


Hello Peter,


Shouldn't psql tab completion be updated as well?


Done.  (I only did the AND CHAIN, not the AND NO CHAIN.)


Sure, that is the useful part.


I must admit that do not like much the three global variables &
Save/Restore functions. I'd suggest saving directly into 3 local variables
in function CommitTransactionCommand, and restoring them when needed. Code
should not be longer. I'd would not bother to skip saving when not
chaining.


We're also using those functions in the 0002 patch.


Hmmm. I have not looked at the second one yet.


Should we just repeat the code three times, or use macros?


I try to avoid global variables when possible as a principle, because I 
paid to learn that they are bad:-) Maybe I'd do a local struct, declare an 
instance within the function, and write two functions to dump/restore the 
transaction status variables into the referenced struct. Maybe this is not 
worth the effort.



Copying & comparing nodes are updated. Should making, outing and reading
nodes also be updated?


TransactionStmt isn't covered by the node serialization functions, so I
didn't see anything to update.  What did you have in mind?


Sigh. I had in mind that the serialization feature would work with all 
possible nodes, not just some of them… which seems quite naïve. The whole 
make/copy/cmp/in/out functions depress me, all this verbose code should be 
automatically generated from struct declarations. I'm pretty sure there 
are hidden bugs in there.



About the tests: I'd suggest to use more options on the different tests,
eg SERIALIZABLE, READ ONLY… Also ISTM that tests should show
transaction_read_only value as well.


OK, updated a bit.  (Using READ ONLY doesn't work because then you can't
write anything inside the transaction.)


Sure. Within a read-only tx, it could check that transaction_read_only is 
on, and still on when chained, though.



Updated patch attached.


First patch applies cleanly, compiles, make check ok.

Further remarks, distinct from the above comments and suggestions:

The documentation should probably tell in the compatibility sections that 
AND CHAIN is standard conforming.


In the automatic rollback tests, maybe I'd insert 3 just once, and use 
another value for the chained transaction.


Tests may eventually vary BEGIN/START, COMMIT/END, ROLLBACK/ABORT.

As you added a SAVEPOINT, maybe I'd try rollbacking to it.

--
Fabien.

Re: Feature: triggers on materialized views

2018-12-26 Thread Mitar
Hi!

I did a bit of benchmarking. It seems my version with UPDATE takes
even slightly less time (~5%).


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar  wrote:
>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar  wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter  wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: chained transactions

2018-12-26 Thread Fabien COELHO




Updated patch attached.  The previous (v2) patch apparently didn't apply
anymore.


Second patch applies cleanly, compiles, "make check" ok.

As I do not know much about the SPI stuff, some of the comments below may 
be very stupid.


I'm wary of changing the SPI_commit and SPI_rollback interfaces which are 
certainly being used outside the source tree and could break countless 
code, and it seems quite unclean that commit and rollback would do 
anything else but committing or rollbacking.


ISTM that it should be kept as is and only managed from the PL/pgsql 
exec_stmt_* functions, which have to be adapted anyway. That would 
minimise changes and not break existing code.


If SPI_* functions are modified, which I would advise against, I find 
keeping the next assignment in the chained case doubtful:


_SPI_current->internal_xact = false;

--
Fabien.



Re: chained transactions

2018-12-26 Thread Fabien COELHO




Updated patch attached.  The previous (v2) patch apparently didn't apply
anymore.


Second patch applies cleanly, compiles, "make check" ok.


Also about testing, I'd do less rounds, 4 quite seems enough.

--
Fabien.



Re: CF app feature request

2018-12-26 Thread Magnus Hagander
On Sun, Dec 23, 2018 at 3:59 PM Alvaro Herrera 
wrote:

> On 2018-Dec-23, Magnus Hagander wrote:
>
> > On Wed, Nov 21, 2018 at 12:52 AM Michael Paquier 
> > wrote:
> >
> > > On Tue, Nov 20, 2018 at 03:30:38PM -0300, Alvaro Herrera wrote:
> > > > On 2018-Nov-20, Tom Lane wrote:
> > > > Certainly not higher than having the dropdown for entry
> author/reviewer
> > > > be sorted alphabetically ... *wink* *wink*
> > >
> > > More *wink* *wink*
> >
> > Here's a slightly late response to that winking :P
> >
> > They *are* sorted. By lastname. Should I take all this winking to mean
> that
> > people would prefer them  being sorted by firstname? :)
>
> Oh, wow. That's not at all obvious ... the fact that there's Scherbaum
> as the first entry, and Wang at the fifth position, threw me off.
>

Oh, there isn't, there's "ads Scherbaum" as his last name. I'm pretty sure
he does that just to mess with people. And for the second one, it's
Alexandra... But yes, I agree it's confusing, especially when you start
mixing in cultures that may not have firstname/lastname working in the way
that we do in the western world.


I can find entries easily enough knowing this.  But I think it would
> make more sense to order by the complete string that makes up the name.
>

Ok, I've pushed a change to do first_name sorting instead. Let's see if
that causes less or more confusion :)  But it will at least match the "full
string sorting".

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Alternative to \copy in psql modelled after \g

2018-12-26 Thread Daniel Verite
Fabien COELHO wrote:

> Is this just kind of a bug fix? Beforehand the documentation says "\g fn" 
> sends to file, but that was not happening with COPY, and now it does as it 
> says?

Yes. The doc says about \g:

  Sends the current query buffer to the server for execution. If an
  argument is given, the query's output is written to the named file
  or piped to the given shell command, instead of displaying it as
  usual.

It does not add "unless the query is a COPY", so it seems right
to make that just work, and call it a bug fix.

> A question: if opening the output file fails, should not the query
> be cancelled and an error be reported? Maybe it is too late for
> that. It seems that "SELECT pg_sleep(10) \g /bad/file" fails but
> executes the query nevertheless.

Yes. This part works as documented:

  "The file or command is written to only if the query successfully
  returns zero or more tuples, not if the query fails or is a
  non-data-returning SQL command."

> However, it does not contain any tests (bad:-)

Testing this requires at least some interaction with the OS which I'm
uncomfortable to add. There is a precedent in
regress/sql/hs_standby_allowed.sql doing:

  COPY hs1 TO '/tmp/copy_test'
  \! cat /tmp/copy_test

We could add something like that in psql.sql, but I'm not fond of it
because it assumes a Unix-ish environment, and that it's OK to clobber
the hardcoded /tmp/copy_test should it preexist. I'd rather work with
a dedicated temporary directory. On Linux mktemp -d could be used, but
I don't know about a portable solution, plus POSIX has deprecated
mktemp.
I'm open to ideas on a portable way for psql.sql to test \g writing to a
file or a program, but ATM my inclination is to not add that test.


> nor documentation (hmmm... maybe none needed, though).

\copy has this paragraph:

  "The syntax of this command is similar to that of the SQL COPY
  command. All options other than the data source/destination are as
  specified for COPY. Because of this, special parsing rules apply to
  the \copy meta-command. Unlike most other meta-commands, the entire
  remainder of the line is always taken to be the arguments of \copy,
  and neither variable interpolation nor backquote expansion are
  performed in the arguments".

We could add that COPY TO STDOUT with a redirection might be used as an
alternative. The downside is that with four paragraphs plus one tip,
the explanations on \copy give already a lot to chew on, so is it
worth to add more?


> ISTM that overriding copystream on open failures is not necessary, because 
> its value is already NULL in this case.
> 
> I'd suggest that is_pipe should be initialized to false, otherwise it is 
> unclear from the code when it is set before use, and some compilers may 
> complain.

OK, will consider these in the next revision.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: chained transactions

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-26, Fabien COELHO wrote:

> > > Copying & comparing nodes are updated. Should making, outing and reading
> > > nodes also be updated?
> > 
> > TransactionStmt isn't covered by the node serialization functions, so I
> > didn't see anything to update.  What did you have in mind?
> 
> Sigh. I had in mind that the serialization feature would work with all
> possible nodes, not just some of them… which seems quite naïve. The whole
> make/copy/cmp/in/out functions depress me, all this verbose code should be
> automatically generated from struct declarations. I'm pretty sure there are
> hidden bugs in there.

There may well be, but keep in mind that the nodes that have out and
read support are used in view declarations and such (stored rules); they
are used pretty extensively.  Nodes that cannot be part of stored rules
don't need to have read support.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera  
> wrote:
> > But then I'm not clear *why* you would like to do a non-concurrent
> > refresh.
> 
> I mostly wanted to support if for two reasons:
> 
> - completeness: maybe we cannot imagine the use case yet, but somebody
> might in the future

Understood.  We don't like features that fail to work in conjunction
with other features, so this is a good goal to keep in mind.

> - getting trigger calls for initial inserts: you can then create
> materialized view without data, attach triggers, and then run a
> regular refresh; this allows you to have only one code path to process
> any (including initial) changes to the view through notifications,
> instead of handling initial data differently

Sounds like you could do this by fixing concurrent refresh to also work
when the MV is WITH NO DATA.

> > Maybe your situation would be best served by forbidding non-
> > concurrent refresh when the MV contains any triggers.
> 
> If this would be acceptable by the community, I could do it.

I think your chances are 50%/50% that this would appear acceptable.

> > Alternatively, maybe reimplement non-concurrent refresh so that it works
> > identically to concurrent refresh (except with a stronger lock).  Not
> > sure if this implies any performance penalties.
> 
> Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> That would then generate reasonable trigger calls.

Right.

> Are there any existing benchmarks for such operations I could use to
> see if there are any performance changes if I change implementation
> here? Any guidelines how to evaluate this?

Not that I know of.  Typically the developer of a feature comes up with
appropriate performance tests also, targetting average and worst cases.

If the performance worsens with the different implementation, one idea
is to keep both and only use the slow one when triggers are present.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Alternative to \copy in psql modelled after \g

2018-12-26 Thread Fabien COELHO



Hello,


Is this just kind of a bug fix? Beforehand the documentation says "\g fn"
sends to file, but that was not happening with COPY, and now it does as it
says?


Yes. [...]

It does not add "unless the query is a COPY", so it seems right
to make that just work, and call it a bug fix.


Does this suggest backpatching?


However, it does not contain any tests (bad:-)


Testing this requires at least some interaction with the OS which I'm
uncomfortable to add.


Hmmm. This means that "\g filename" goes fully untested:-( A casual grep 
seems to confirm this. Sigh:-(



There is a precedent in regress/sql/hs_standby_allowed.sql doing:

 COPY hs1 TO '/tmp/copy_test'
 \! cat /tmp/copy_test


Indeed. I'm unsure windows has cat or /tmp, so I do not understand how it 
works on such platform. Maybe I'm missing something.



We could add something like that in psql.sql, but I'm not fond of it
because it assumes a Unix-ish environment,


Yep.


I'm open to ideas on a portable way for psql.sql to test \g writing to a
file or a program, but ATM my inclination is to not add that test.


A relative file could be ok? After some testing, the standard regression 
tests do not cd to some special place, so it may fail?


However TAP tests do that, and I have used this extensively with pgbench, 
so a psql TAP test could do that and other things, such as importing a csv 
file or whatever.



nor documentation (hmmm... maybe none needed, though).


\copy has this paragraph:

 "The syntax of this command is similar to that of the SQL COPY
 command. All options other than the data source/destination are as
 specified for COPY. Because of this, special parsing rules apply to
 the \copy meta-command. Unlike most other meta-commands, the entire
 remainder of the line is always taken to be the arguments of \copy,
 and neither variable interpolation nor backquote expansion are
 performed in the arguments".

We could add that COPY TO STDOUT with a redirection might be used as an
alternative. The downside is that with four paragraphs plus one tip,
the explanations on \copy give already a lot to chew on, so is it
worth to add more?


I'd say that a small paragraph with the tip would be ok.

--
Fabien.



Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-12-26 Thread Alexey Kondratov

Greetings,



- Reusing the GUC parser is something I would avoid as well.  Not 
worth

the complexity.
Yes, I don't like it either. I will try to make guc-file.l frontend 
safe.

Any success with that?


I looked into it and found that currently guc-file.c is built as part 
of guc.c, so it seems to be even more complicated to unbound 
guc-file.c from backend. Thus, I have some plan of how to proceed with 
patch:


1) Add guc-file.h and build guc-file.c separately from guc.c

2) Put guc-file.l / guc-file.h into common/*

3) Isolate all backend specific calls in guc-file.l with #ifdef FRONTEND

Though I am not sure that this work is worth doing against extra 
redundancy added by simply adding frontend-safe copy of guc-file.l 
lexer. If someone has any thoughts I would be glad to receive comments.




I have finally worked it out. Now there is a common version of 
guc-file.l and guc-file.c is built separately from guc.c. I had to use a 
limited number of #ifndef FRONTEND, mostly to replace erreport calls. 
Also, ProcessConfigFile and ProcessConfigFileInternal have been moved 
inside guc.c explicitly as being a backend specific. So for me this 
solution looks much more concise and neat.


Please, find the new version of patch attached. Tap tests have been 
updated as well in order to handle both command line and postgresql.conf 
specified restore_command.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 8a6c9f89f45c9568d95e05b0586d1cc54905e6de Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 src/backend/Makefile  |   4 +-
 src/backend/commands/extension.c  |   1 +
 src/backend/utils/misc/Makefile   |   8 -
 src/backend/utils/misc/guc.c  | 434 +--
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/RewindTest.pm   |  96 +++-
 src/bin/pg_rewind/parsexlog.c | 182 ++-
 src/bin/pg_rewind/pg_rewind.c |  91 +++-
 src/bin/pg_rewind/pg_rewind.h |  12 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/common/Makefile   |   7 +-
 src/{backend/utils/misc => common}/guc-file.l | 514 --
 src/include/common/guc-file.h |  50 ++
 src/include/utils/guc.h   |  39 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 src/tools/msvc/clean.bat  |   2 +-
 18 files changed, 952 insertions(+), 504 deletions(-)
 rename src/{backend/utils/misc => common}/guc-file.l (60%)
 create mode 100644 src/include/common/guc-file.h

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 25eb043941..ddbe2f3fce 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -186,7 +186,7 @@ distprep:
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	distprep
-	$(MAKE) -C utils/misc	guc-file.c
+	$(MAKE) -C common	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
 
 
@@ -307,7 +307,7 @@ maintainer-clean: distclean
 	  replication/syncrep_scanner.c \
 	  storage/lmgr/lwlocknames.c \
 	  storage/lmgr/lwlocknames.h \
-	  utils/misc/guc-file.c \
+	  common/guc-file.c \
 	  utils/sort/qsort_tuple.c
 
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 31dcfe7b11..ec0367d068 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -47,6 +47,7 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/guc-file.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index a53fcdf188..2e6a879c46 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -25,11 +25,3 @@ override CPPFLAGS += -DPG_KRB_SRVTAB='"$(krb_srvtab)"'
 endif
 
 include $(top_srcdir)/src/backend/common.mk
-
-# guc-file is compiled as part of guc
-guc.o: guc-file.c
-
-# Note: guc-file.c is not deleted by 'make clean',
-# since we want to ship it in distribution tarballs.
-clean:
-	@rm -f lex.yy.c
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..a866503186 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigger.h"
+#include "common/guc-file.h"
 #include "common/string.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -210,7 +211,6 @@ static bool check_r

Re: Change pgarch_readyXlog() to return .history files first

2018-12-26 Thread David Steele

On 12/24/18 1:31 PM, Michael Paquier wrote:

On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote:

Thanks for the lookups.  I can see that the patch applies without
conflicts down to 9.4, and based on the opinions gathered on this
thread back-patching this stuff is the consensus, based on input from
Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
any objections from others in doing so?


On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed
"only" down to 9.5.  Thanks David for the patch and Horiguchi-san for
the review.


Thanks, Michael!

I'm not too worried about 9.4 since it is currently the oldest supported 
version.  HA users tend to be on the leading edge of the upgrade curve 
and others have the opportunity to upgrade if the reordering will help them.


--
-David
da...@pgmasters.net



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 19, 2018 at 8:01 PM Andres Freund  wrote:
> The last time I looked into perfect hash functions, it wasn't easy to
> find a generator that competed with a decent normal hashtable (in
> particular gperf's are very unconvincing). The added tooling is a
> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
> search, where the latter will usually loose.  Wonder if we shouldn't
> generate a serialized non-perfect hashtable instead. The lookup code for
> a read-only hashtable without concern for adversarial input is pretty
> trivial.

I wonder if we could do something really simple like a lookup based on
the first character of the scan keyword. It looks to me like there are
440 keywords right now, and the most common starting letter is 'c',
which is the first letter of 51 keywords. So dispatching based on the
first letter clips at least 3 steps off the binary search.  I don't
know whether that's enough to be worthwhile, but it's probably pretty
simple to implement.

I'm not sure that I understand quite what you have in mind for a
serialized non-perfect hashtable.  Are you thinking that we'd just
construct a simplehash and serialize it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 19, 2018 at 8:01 PM Andres Freund  wrote:
>> The last time I looked into perfect hash functions, it wasn't easy to
>> find a generator that competed with a decent normal hashtable (in
>> particular gperf's are very unconvincing). The added tooling is a
>> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
>> search, where the latter will usually loose.  Wonder if we shouldn't
>> generate a serialized non-perfect hashtable instead. The lookup code for
>> a read-only hashtable without concern for adversarial input is pretty
>> trivial.

> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dispatching based on the
> first letter clips at least 3 steps off the binary search.  I don't
> know whether that's enough to be worthwhile, but it's probably pretty
> simple to implement.

I think there's a lot of goalpost-moving going on here.  The original
idea was to trim the physical size of the data structure, as stated
in the thread subject, and just reap whatever cache benefits we got
along the way from that.  I am dubious that we actually have any
performance problem in this code that needs a big dollop of added
complexity to fix.

In my hands, the only part of the low-level parsing code that commonly
shows up as interesting in profiles is the Bison engine.  That's probably
because the grammar tables are circa half a megabyte and blow out cache
pretty badly :-(.  I don't know of any way to make that better,
unfortunately.  I suspect that it's just going to get worse, because
people keep submitting additions to the grammar.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread David Fetter
On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
> 
> In my hands, the only part of the low-level parsing code that
> commonly shows up as interesting in profiles is the Bison engine.

Should we be considering others? As I understand it, steps have been
made in this field since yacc was originally designed. Is LALR
actually suitable for languages like SQL, or is it just there for
historical reasons?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Tom Lane
Thomas Munro  writes:
> Since it's not fixing a bug, we wouldn't back-patch that into existing
> releases.  But I agree that we should do something like this for
> PostgreSQL 12, and I think we should make it user configurable.

I'm -1 on making this user configurable via a GUC; that adds documentation
and compatibility burdens that we don't need, for something of no value
to 99.99% of users.  The fact that the default would need to be
platform-dependent just makes that tradeoff even worse.  I think the other
0.01% who need to change the default (and are bright enough to be doing
the right thing for the right reasons) could certainly handle something
like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
for a precedent that I think applies well here.  So I'd favor just doing
it that way.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 26, 2018 at 11:22 AM Tom Lane  wrote:
> I think there's a lot of goalpost-moving going on here.  The original
> idea was to trim the physical size of the data structure, as stated
> in the thread subject, and just reap whatever cache benefits we got
> along the way from that.  I am dubious that we actually have any
> performance problem in this code that needs a big dollop of added
> complexity to fix.

I have seen ScanKeywordLookup show up in profiles quite often and
fairly prominently -- like several percent of total runtime. I'm not
trying to impose requirements on John's patch, and I agree that
reducing the physical size of the structure is a good step whether
anything else is done or not. However, I don't see that as a reason to
shut down further discussion of other possible improvements.  If his
patch makes this disappear from profiles, cool, but if it doesn't,
then sooner or later somebody's going to want to do more.

FWIW, my bet is this helps but isn't enough to get rid of the problem
completely.  A 9-step binary search has got to be slower than a really
well-optimized hash table lookup. In a perfect world the latter
touches the cache line containing the keyword -- which presumably is
already in cache since we just scanned it -- then computes a hash
value without touching any other cache lines -- and then goes straight
to the right entry.  So it touches ONE new cache line.  That might a
level of optimization that's hard to achieve in practice, but I don't
think it's crazy to want to get there.

> In my hands, the only part of the low-level parsing code that commonly
> shows up as interesting in profiles is the Bison engine.  That's probably
> because the grammar tables are circa half a megabyte and blow out cache
> pretty badly :-(.  I don't know of any way to make that better,
> unfortunately.  I suspect that it's just going to get worse, because
> people keep submitting additions to the grammar.

I'm kinda surprised that you haven't seen ScanKeywordLookup() in
there, but I agree with you that the size of the main parser tables is
a real issue, and that there's no easy solution. At various times
there has been discussion of using some other parser generator, and
I've also toyed with the idea of writing one specifically for
PostgreSQL. Unfortunately, it seems like bison is all but
unmaintained; the alternatives are immature and have limited adoption
and limited community; and writing something from scratch is a ton of
work.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Feature: temporary materialized views

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-25, Mitar wrote:

> Sometimes materialized views are used to cache a complex query on
> which a client works. But after client disconnects, the materialized
> view could be deleted. Regular VIEWs and TABLEs both have support for
> temporary versions which get automatically dropped at the end of the
> session. It seems it is easy to add the same thing for materialized
> views as well. See attached PoC patch.

I think MVs that are dropped at session end are a sensible feature.  I
probably wouldn't go as far as allowing ON COMMIT actions, though, so
this much effort is the right amount.

I think if you really want to do this you should just use OptTemp, and
delete OptNoLog.  Of course, you need to add tests and patch the docs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-26 Thread Mitar
Hi!

On Wed, Dec 26, 2018 at 4:38 AM Alvaro Herrera  wrote:
> Sounds like you could do this by fixing concurrent refresh to also work
> when the MV is WITH NO DATA.

Yes, I do not think this would be too hard to fix. I could do this nevertheless.

> > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> > That would then generate reasonable trigger calls.
>
> Right.

I have tested this yesterday and performance is 2x worse that heap
swap on the benchmark I made. So I do not think this is a viable
approach.

I am now looking into simply triggering TRUNCATE and INSERT triggers
after heap swap simulating the above. I made AFTER STATEMENT triggers
and it looks like it is working, only NEW table is not populated for
some reason. Any suggestions? See attached patch.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index e583989e1d..b79b9be687 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -882,8 +882,54 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
+	Relation	matviewRel;
+	EState		*estate;
+	ResultRelInfo resultRelInfo;
+	TransitionCaptureState *transition_capture;
+
 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
 	 RecentXmin, ReadNextMultiXactId(), relpersistence);
+
+	matviewRel = heap_open(matviewOid, NoLock);
+
+	/* Prepare to catch AFTER triggers. */
+	AfterTriggerBeginQuery();
+
+	/*
+	 * To fire triggers, we'll need an EState as well as a ResultRelInfo.
+	 */
+	estate = CreateExecutorState();
+
+	InitResultRelInfo(&resultRelInfo,
+	  matviewRel,
+	  0,	/* dummy rangetable index */
+	  NULL,
+	  0);
+	estate->es_result_relations = &resultRelInfo;
+	estate->es_num_result_relations = 1;
+	estate->es_result_relation_info = &resultRelInfo;
+
+	/* Execute AFTER STATEMENT truncate triggers */
+	ExecASTruncateTriggers(estate, &resultRelInfo);
+
+	transition_capture =
+		MakeTransitionCaptureState(matviewRel->trigdesc,
+   RelationGetRelid(matviewRel),
+   CMD_INSERT);
+
+	/* Execute AFTER STATEMENT insertion triggers */
+	ExecASInsertTriggers(estate, &resultRelInfo, transition_capture);
+
+	/* Handle queued AFTER triggers */
+	AfterTriggerEndQuery(estate);
+
+	/* Close any trigger target relations */
+	ExecCleanUpTriggerState(estate);
+
+	/* We can clean up the EState now */
+	FreeExecutorState(estate);
+
+	heap_close(matviewRel, NoLock);
 }
 
 /*


Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
David Fetter  writes:
> On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
>> In my hands, the only part of the low-level parsing code that
>> commonly shows up as interesting in profiles is the Bison engine.

> Should we be considering others?

We've looked around before, IIRC, and not really seen any arguably
better tools.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas  writes:
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
> I've also toyed with the idea of writing one specifically for
> PostgreSQL. Unfortunately, it seems like bison is all but
> unmaintained; the alternatives are immature and have limited adoption
> and limited community; and writing something from scratch is a ton of
> work.  :-(

Yeah, and also: SQL is a damn big and messy language, and so it's not
very clear that it's really bison's fault that it's slow to parse.
We might do a ton of work to implement an alternative, and then find
ourselves no better off.

regards, tom lane



Re: Feature: temporary materialized views

2018-12-26 Thread Mitar
Hi!

On Wed, Dec 26, 2018 at 9:00 AM Alvaro Herrera  wrote:
> I think MVs that are dropped at session end are a sensible feature.

Thanks.

> I probably wouldn't go as far as allowing ON COMMIT actions, though

I agree. I do not see much usefulness for it. The only use case I can
think of would be to support REFRESH as an ON COMMIT action. That
would be maybe useful in the MV setting. After every transaction in my
session, REFRESH this materialized view.

But personally I do not have an use case for that, so I will leave it
to somebody else. :-)

> I think if you really want to do this you should just use OptTemp, and
> delete OptNoLog.

Sounds good.

OptTemp seems to have a misleading warning in some cases when it is
not used on tables though:

"GLOBAL is deprecated in temporary table creation"

Should we change this language to something else? "GLOBAL is
deprecated in temporary object creation"? Based on grammar it seems to
be used for tables, views, sequences, and soon materialized views.

> Of course, you need to add tests and patch the docs.

Sure.

[1] https://www.postgresql.org/message-id/29165.1545842105%40sss.pgh.pa.us


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Feature: temporary materialized views

2018-12-26 Thread Pavel Stehule
st 26. 12. 2018 v 18:20 odesílatel Mitar  napsal:

> Hi!
>
> On Wed, Dec 26, 2018 at 9:00 AM Alvaro Herrera 
> wrote:
> > I think MVs that are dropped at session end are a sensible feature.
>
> Thanks.
>
> > I probably wouldn't go as far as allowing ON COMMIT actions, though
>
> I agree. I do not see much usefulness for it. The only use case I can
> think of would be to support REFRESH as an ON COMMIT action. That
> would be maybe useful in the MV setting. After every transaction in my
> session, REFRESH this materialized view.
>
> But personally I do not have an use case for that, so I will leave it
> to somebody else. :-)
>
> > I think if you really want to do this you should just use OptTemp, and
> > delete OptNoLog.
>
> Sounds good.
>
> OptTemp seems to have a misleading warning in some cases when it is
> not used on tables though:
>
> "GLOBAL is deprecated in temporary table creation"
>
> Should we change this language to something else? "GLOBAL is
> deprecated in temporary object creation"? Based on grammar it seems to
> be used for tables, views, sequences, and soon materialized views.
>

This message is wrong - probably better "GLOBAL temporary tables are not
supported"

Regards

Pavel

>
> > Of course, you need to add tests and patch the docs.
>
> Sure.
>
> [1] https://www.postgresql.org/message-id/29165.1545842105%40sss.pgh.pa.us
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
>


Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Robert Haas
On Wed, Dec 26, 2018 at 11:43 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Since it's not fixing a bug, we wouldn't back-patch that into existing
> > releases.  But I agree that we should do something like this for
> > PostgreSQL 12, and I think we should make it user configurable.
>
> I'm -1 on making this user configurable via a GUC; that adds documentation
> and compatibility burdens that we don't need, for something of no value
> to 99.99% of users.  The fact that the default would need to be
> platform-dependent just makes that tradeoff even worse.  I think the other
> 0.01% who need to change the default (and are bright enough to be doing
> the right thing for the right reasons) could certainly handle something
> like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
> for a precedent that I think applies well here.  So I'd favor just doing
> it that way.

I disagree.  I think there is a growing body of evidence that
b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 killed performance on many
types of non-Linux systems.  This is the first report I recall about
AIX, but there have been previous complaints about some BSD variants.

When I was working on developing that commit, I went and tried to find
out all of the different ways of getting some shared memory from
various operating systems and compared them.  Anonymous shared memory
allocated via mmap() was the hands-down winner in almost every
respect: supported on many systems, no annoying operating system
limits, automatic deallocation when the last process exits.  It had
the disadvantage that it didn't have an equivalent of nattch, which
meant that we had to keep a small System V segment around just for
that purpose, but otherwise it looked really good.

However, I only considered the situation from a functional point of
view. I never considered the possibility that the method used to
obtain shared memory from the operating system would affect the
performance of that shared memory. To my surprise, however, it does,
and on multiple operating systems from various parts of the UNIX
family tree. If I'd known that at the time, that commit probably would
not have gone into the tree in the form that it did.  I suspect that
there would have been a loud clamor for configurability, and I think I
would have agreed.

You may be right that this is of no value to a high percentage our
users, but I think that's only because a high percentage of our users
run Linux or Windows, which happen not to be affected. I'm rather
proud, though, of PostgreSQL's long history of trying to be
cross-platform. Even if operating systems like AIX or BSD are a small
percentage of the overall user base, I think it's totally fair to add
a GUC which likely be helpful to a large percentage of those people,
and I think the GUC proposed here likely falls into that category.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Offline enabling/disabling of data checksums

2018-12-26 Thread Robert Haas
On Fri, Dec 21, 2018 at 6:28 PM Michael Paquier  wrote:
> 2) Which kind of interface do we want to use?  When I did my own
> flavor of pg_checksums, I used an --action switch able to use the
> following values:
> - enable
> - disable
> - verify
> The switch cannot be specified twice (perhaps we could enforce the
> last value as other binaries do in the tree, not sure if that's
> adapted here).  A second type of interface is to use one switch per
> action.  For both interfaces if no action is specify then the tool
> fails.  Vote is open.

I vote for separate switches.  Using the same switch with an argument
seems like it adds typing for no real gain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "repliation" as database name

2018-12-26 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> I found that in the documentation thanks to a notification
> off-list. And after some reconfirmation, what I want to fix is
> only a few lines of comment in pg_hba.conf.sample.

> -# database name, or a comma-separated list thereof. The "all"
> -# keyword does not match "replication". Access to replication
> -# must be enabled in a separate record (see example below).
> +# database name, or a comma-separated list thereof. The "all" keyword
> +# matches all databases. The "replication" keyword matches a physical
> +# replication connection request and it must be enabled in a separate
> +# record (see example below)

Hm, I agree that the para doesn't read very well now, but I think this
could be improved further.  How about something like

# DATABASE can be "all", "sameuser", "samerole", "replication", a
# database name, or a comma-separated list thereof.  The "replication"
# keyword matches replication connection requests (see example below).
# The "all" keyword matches all database names, but not replication
# connections.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Andres Freund
Hi,

On 2018-12-26 11:50:18 -0500, Robert Haas wrote:
> On Wed, Dec 26, 2018 at 11:22 AM Tom Lane  wrote:
> > I think there's a lot of goalpost-moving going on here.  The original
> > idea was to trim the physical size of the data structure, as stated
> > in the thread subject, and just reap whatever cache benefits we got
> > along the way from that.  I am dubious that we actually have any
> > performance problem in this code that needs a big dollop of added
> > complexity to fix.
> 
> I have seen ScanKeywordLookup show up in profiles quite often and
> fairly prominently -- like several percent of total runtime. I'm not
> trying to impose requirements on John's patch, and I agree that
> reducing the physical size of the structure is a good step whether
> anything else is done or not. However, I don't see that as a reason to
> shut down further discussion of other possible improvements.  If his
> patch makes this disappear from profiles, cool, but if it doesn't,
> then sooner or later somebody's going to want to do more.

I agree. And most of the patch would be a pre-requisite for anything
more elaborate anyway.


> FWIW, my bet is this helps but isn't enough to get rid of the problem
> completely.  A 9-step binary search has got to be slower than a really
> well-optimized hash table lookup.

Yea, at least with a non-optimized layout. If we'd used a binary search
optimized lookup order it might be different, but probably at best
equivalent to a good hashtable.

> > In my hands, the only part of the low-level parsing code that commonly
> > shows up as interesting in profiles is the Bison engine.  That's probably
> > because the grammar tables are circa half a megabyte and blow out cache
> > pretty badly :-(.  I don't know of any way to make that better,
> > unfortunately.  I suspect that it's just going to get worse, because
> > people keep submitting additions to the grammar.
> 
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
> I've also toyed with the idea of writing one specifically for
> PostgreSQL. Unfortunately, it seems like bison is all but
> unmaintained; the alternatives are immature and have limited adoption
> and limited community; and writing something from scratch is a ton of
> work.  :-(

My bet is, and has been for quite a while, that we'll have to go for a
hand-written recursive descent type parser.  They can be *substantially*
faster, and performance isn't as affected by the grammar size. And,
about as important, they also allow for a lot more heuristics around
grammar errors - I do think we'll soon have to better than to throw a
generic syntax error for the cases where the grammar doesn't match at
all.

Greetings,

Andres Freund



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Andres Freund
Hi,

On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
> I'm not sure that I understand quite what you have in mind for a
> serialized non-perfect hashtable.  Are you thinking that we'd just
> construct a simplehash and serialize it?

I was basically thinking that we'd have the perl script implement a
simple hash and put the keyword (pointers) into an array, handling
conflicts with the simplest linear probing thinkable. As there's never a
need for modifications, that ought to be fairly simple.

Greetings,

Andres Freund



Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2018-12-26 Thread Robert Haas
On Mon, Dec 24, 2018 at 6:08 AM Alexey Kondratov
 wrote:
> I would like to propose a change, which allow CLUSTER, VACUUM FULL and
> REINDEX to modify relation tablespace on the fly.

ALTER TABLE already has a lot of logic that is oriented towards being
able to do multiple things at the same time.  If we added CLUSTER,
VACUUM FULL, and REINDEX to that set, then you could, say, change a
data type, cluster, and change tablespaces all in a single SQL
command.

That would be cool, but probably a lot of work.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-26, Robert Haas wrote:

> On Mon, Dec 24, 2018 at 6:08 AM Alexey Kondratov
>  wrote:
> > I would like to propose a change, which allow CLUSTER, VACUUM FULL and
> > REINDEX to modify relation tablespace on the fly.
> 
> ALTER TABLE already has a lot of logic that is oriented towards being
> able to do multiple things at the same time.  If we added CLUSTER,
> VACUUM FULL, and REINDEX to that set, then you could, say, change a
> data type, cluster, and change tablespaces all in a single SQL
> command.

That's a great observation.

> That would be cool, but probably a lot of work.  :-(

But is it?  ALTER TABLE is already doing one kind of table rewrite
during phase 3, and CLUSTER is just a different kind of table rewrite
(which happens to REINDEX), and VACUUM FULL is just a special case of
CLUSTER.  Maybe what we need is an ALTER TABLE variant that executes
CLUSTER's table rewrite during phase 3 instead of its ad-hoc table
rewrite.

As for REINDEX, I think it's valuable to move tablespace together with
the reindexing.  You can already do it with the CREATE INDEX
CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
not going to provide that, and it seems worth doing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Andres Freund



On December 26, 2018 6:48:31 PM GMT+01:00, Robert Haas  
wrote:
>I disagree.  I think there is a growing body of evidence that
>b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 killed performance on many
>types of non-Linux systems.  This is the first report I recall about
>AIX, but there have been previous complaints about some BSD variants.

Exactly. I think we should have added this a few years ago.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Offline enabling/disabling of data checksums

2018-12-26 Thread Fabien COELHO


Hallo Michael,


It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.


I'd rather have explicit switches for verify, enable & disable, and verify 
would be the default if none is provided.



This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?


I would say yes.


1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.


I'd agree to rename the tool as "pg_checksums".


2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.


Hmmm. The file is indeed scanned, and "operate" is kind of very fuzzy.


3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).


My 0.02€ is that data safety should comes first, thus checksums should be 
enabled by default.


About the patch: applies, compiles, "make check" ok.

There is no documentation.

In "scan_file", I would open RW only for enable, but keep RO for verify.

Also, the full page is rewritten... would it make sense to only overwrite 
the checksum part itself?


It seems that the control file is unlinked and then rewritten. If the 
rewritting fails, or the command is interrupted, the user has a problem.


Could the control file be simply opened RW? Else, I would suggest to 
rename (eg add .tmp), write the new one, then unlink the old one, so that 
recovering the old state in case of problem is possible.


For enable/disable, while the command is running, it should mark the 
cluster as opened to prevent an unwanted database start. I do not see 
where this is done.


--
Fabien.

random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Thanks!  I pushed this with two changes -- one was to reword the docs a
> bit more, and the other was to compute in_sample only if it's going to
> be used (when exceeded is true).  I hope this won't upset any compilers ...
> I wonder if there's any sensible way to verify the behavior in an
> automated fashion.  Who know what marvels we'd discover about the
> reliability/uniformity of random() across platforms.

So Coverity doesn't like this patch:

2250in_sample = exceeded &&
2251log_statement_sample_rate != 0 &&
2252(log_statement_sample_rate == 1 ||
>>> CID 1441909:  Security best practices violations  (DC.WEAK_CRYPTO)
>>> "random" should not be used for security related applications, as 
>>> linear congruential algorithms are too easy to break.
2253 random() <= log_statement_sample_rate * 
MAX_RANDOM_VALUE);
2254 
2255if ((exceeded && in_sample) || log_duration)

I am not sure I buy the argument that this is a security hazard, but
there are other reasons to question the use of random() here, some of
which you stated yourself above.  Another one is that using random()
for internal purposes interferes with applications' possible use of
drandom() and setseed(), ie an application depending on getting a
particular random series would see different behavior depending on
whether this GUC is active or not.

I wonder whether we should establish a project policy to avoid use
of random() for internal purposes, ie try to get to a point where
drandom() is the only caller in the backend.  A quick grep says
that there's a dozen or so callers, so this patch certainly isn't
the only offender ... but should we make an effort to convert them
all to use, say, pg_erand48()?  I think all the existing callers
could happily share a process-wide random state, so we could make
a wrapper that's no harder to use than random().

Another idea, which would be a lot less prone to breakage by
add-on code, is to change drandom() and setseed() to themselves
use pg_erand48() with a private seed.  Then we could positively
promise that their behavior isn't affected by anything else
(and it'd become platform-independent, too, which it probably
isn't today).  There would be a one-time compatibility breakage
for anyone who's depending on the exact current behavior, but
that might be OK considering how weak our guarantees are for it
right now.

regards, tom lane



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 26, 2018 at 11:43 AM Tom Lane  wrote:
>> I'm -1 on making this user configurable via a GUC; that adds documentation
>> and compatibility burdens that we don't need, for something of no value
>> to 99.99% of users.
> ...
> You may be right that this is of no value to a high percentage our
> users, but I think that's only because a high percentage of our users
> run Linux or Windows, which happen not to be affected. I'm rather
> proud, though, of PostgreSQL's long history of trying to be
> cross-platform. Even if operating systems like AIX or BSD are a small
> percentage of the overall user base, I think it's totally fair to add
> a GUC which likely be helpful to a large percentage of those people,
> and I think the GUC proposed here likely falls into that category.

You misread what I said.  I don't say that we shouldn't fix this;
what I'm saying is we should not do so via a user-configurable knob.
We should be able to auto-configure this and just handle it internally.
I have zero faith in the idea that users would set the knob correctly.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Andres Freund  writes:
> My bet is, and has been for quite a while, that we'll have to go for a
> hand-written recursive descent type parser.

I will state right up front that that will happen over my dead body.

It's impossible to write correct RD parsers by hand for any but the most
trivial, conflict-free languages, and what we have got to deal with
is certainly neither of those; moreover, it's a constantly moving target.
We'd be buying into an endless landscape of parser bugs if we go that way.
It's *not* worth it.

regards, tom lane



Re: removal of dangling temp tables

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-16, Michael Paquier wrote:

> On Sat, Dec 15, 2018 at 09:51:31AM -0500, Tom Lane wrote:
> > Alvaro Herrera  writes:
> >> Oh, we already have it!  Sorry, I overlooked it.  With that, it seems
> >> the patch is fairly simple ... I wonder about the locking implications
> >> in autovacuum, though -- the value is set in backends without acquiring
> >> a lock.
> > 
> > I was wondering about that too.  But I think it's probably OK.  If
> > autovacuum observes that (a) a table is old enough to pose a wraparound
> > hazard and (b) its putatively owning backend hasn't yet set
> > tempNamespaceId, then I think it's safe to conclude that that table is
> > removable, despite the theoretical race condition.
> 
> This relies on the fact that the flag gets set by a backend within a
> transaction context, and autovacuum would not see yet temp relations
> associated to it at the moment of the scan of pg_class if the backend
> has not committed yet its namespace creation via the creation of the
> first temp table it uses.

Makes sense, thanks.

I think there are two possible ways forward.  The conservative one is to
just apply the attached patch to branches 9.4-10.  That will let
autovacuum drop tables when the backend is in another database. It may
not solve the problem for the bunch of users that have only one database
that takes the majority of connections, but I think it's worth doing
nonetheless.  I tested the 9.4 instance and it works fine; tables are
deleted as soon as I make the session connection to another database.

The more aggressive action is to backpatch 943576bddcb5 ("Make
autovacuum more aggressive to remove orphaned temp tables") which is
currently only in pg11.  We would put the new PGPROC member at the end
of the struct, to avoid ABI incompatibilities, but it'd bring trouble
for extensions that put PGPROC in arrays.  I checked the code of some
known extensions; found that pglogical uses PGPROC, but only pointers to
it, so it wouldn't be damaged by the proposed change AFAICS.


Another possibly useful change is to make DISCARD ALL and DISCARD TEMP
delete everything in what would be the backend's temp namespace, even if
it hasn't been initialized yet.  This would cover the case where a
connection pooler keeps the connection open for a very long time, which
I think is a common case.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 5aa717be9fe7288cbec043cdea0fb757433f3462
Author: Alvaro Herrera 
AuthorDate: Wed Dec 26 15:12:44 2018 -0300
CommitDate: Wed Dec 26 15:12:45 2018 -0300

Improve autovacuum logic about temp tables nearing XID wraparound

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c18285d690..62a06add56 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2108,11 +2108,18 @@ do_autovacuum(void)
 		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 		{
 			int			backendID;
+			PGPROC	   *proc;
 
 			backendID = GetTempNamespaceBackendId(classForm->relnamespace);
 
-			/* We just ignore it if the owning backend is still active */
-			if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL)
+			/*
+			 * We just ignore it if the owning backend is still active in the
+			 * same database.
+			 */
+			if (backendID != InvalidBackendId &&
+(backendID == MyBackendId ||
+ (proc = BackendIdGetProc(backendID)) == NULL ||
+ proc->databaseId != MyDatabaseId))
 			{
 /*
  * We found an orphan temp table (which was probably left


Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 10:45 AM Tom Lane  wrote:
> I wonder whether we should establish a project policy to avoid use
> of random() for internal purposes, ie try to get to a point where
> drandom() is the only caller in the backend.  A quick grep says
> that there's a dozen or so callers, so this patch certainly isn't
> the only offender ... but should we make an effort to convert them
> all to use, say, pg_erand48()?  I think all the existing callers
> could happily share a process-wide random state, so we could make
> a wrapper that's no harder to use than random().

I've used setseed() to make nbtree's "getting tired" behavior
deterministic for specific test cases I've developed -- the random()
choice of whether to split a page full of duplicates, or continue
right in search of free space becomes predictable. I've used this to
determine whether my nbtree patch's pg_upgrade'd indexes have
precisely the same behavior as v3 indexes on the master branch
(precisely the same in terms of the structure of the final index
following a bulk load).

I'm not sure whether or not kind of debugging scenario is worth giving
much weight to going forward, but it's something to consider. It seems
generally useful to be able to force deterministic-ish behavior in a
single session. I don't expect that the test case is even a bit
portable, but the technique was quite effective.

-- 
Peter Geoghegan



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
>> I'm not sure that I understand quite what you have in mind for a
>> serialized non-perfect hashtable.  Are you thinking that we'd just
>> construct a simplehash and serialize it?

> I was basically thinking that we'd have the perl script implement a
> simple hash and put the keyword (pointers) into an array, handling
> conflicts with the simplest linear probing thinkable. As there's never a
> need for modifications, that ought to be fairly simple.

I think it was Knuth who said that when you use hashing, you are putting
a great deal of faith in the average case, because the worst case is
terrible.  The applicability of that to this problem is that if you hit
a bad case (say, a long collision chain affecting some common keywords)
you could end up with poor performance that affects a lot of people for
a long time.  And our keyword list is not so static that you could prove
once that the behavior is OK and then forget about it.

So I'm suspicious of proposals to use simplistic hashing here.

There might well be some value in Robert's idea of keying off the first
letter to get rid of the first few binary-search steps, not least because
those steps are particularly terrible from a cache-footprint perspective.
I'm not sold on doing anything significantly more invasive than that.

regards, tom lane



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Dec 26, 2018 at 10:45 AM Tom Lane  wrote:
>> I wonder whether we should establish a project policy to avoid use
>> of random() for internal purposes, ie try to get to a point where
>> drandom() is the only caller in the backend.  A quick grep says
>> that there's a dozen or so callers, so this patch certainly isn't
>> the only offender ... but should we make an effort to convert them
>> all to use, say, pg_erand48()?  I think all the existing callers
>> could happily share a process-wide random state, so we could make
>> a wrapper that's no harder to use than random().

> I've used setseed() to make nbtree's "getting tired" behavior
> deterministic for specific test cases I've developed -- the random()
> choice of whether to split a page full of duplicates, or continue
> right in search of free space becomes predictable. I've used this to
> determine whether my nbtree patch's pg_upgrade'd indexes have
> precisely the same behavior as v3 indexes on the master branch
> (precisely the same in terms of the structure of the final index
> following a bulk load).

TBH, I'd call it a bug --- maybe even a low-grade security hazard
--- that it's possible to affect that from user level.

In fact, contemplating that for a bit: it is possible, as things
stand in HEAD, for a user to control which of his statements will
get logged if the DBA has enabled log_statement_sample_rate.
It doesn't take a lot of creativity to think of ways to abuse that.
So maybe Coverity had the right idea to start with.

There might well be debugging value in affecting internal PRNG usages,
but let's please not think it's a good idea that that's trivially
reachable from SQL.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread John Naylor
On 12/26/18, Robert Haas  wrote:
> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dispatching based on the
> first letter clips at least 3 steps off the binary search.  I don't
> know whether that's enough to be worthwhile, but it's probably pretty
> simple to implement.

Using radix tree structures for the top couple of node levels is a
known technique to optimize tries that need to be more space-efficient
at lower levels, so this has precedent. In this case there would be a
space trade off of

(alphabet size, rounded up) * (size of index to lower boundary + size
of index to upper boundary) = 32 * (2 + 2) = 128 bytes

which is pretty small compared to what we'll save by offset-based
lookup. On average, there'd be 4.1 binary search steps, which is nice.
I agree it'd be fairly simple to do, and might raise the bar for doing
anything more complex.

-John Naylor



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 11:31 AM Tom Lane  wrote:
 > I've used setseed() to make nbtree's "getting tired" behavior
> > deterministic for specific test cases I've developed -- the random()
> > choice of whether to split a page full of duplicates, or continue
> > right in search of free space becomes predictable.

> TBH, I'd call it a bug --- maybe even a low-grade security hazard
> --- that it's possible to affect that from user level.

I don't disagree with that conclusion. Deterministic behavior is
always preferable to random behavior when the randomness is not truly
necessary.

> There might well be debugging value in affecting internal PRNG usages,
> but let's please not think it's a good idea that that's trivially
> reachable from SQL.

I hesitate to say that there is much value beyond the value that I've
found in this one instance. Maybe the remaining cases where this
technique could be applied just aren't very interesting.

-- 
Peter Geoghegan



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 11:46 AM Peter Geoghegan  wrote:
> > There might well be debugging value in affecting internal PRNG usages,
> > but let's please not think it's a good idea that that's trivially
> > reachable from SQL.
>
> I hesitate to say that there is much value beyond the value that I've
> found in this one instance. Maybe the remaining cases where this
> technique could be applied just aren't very interesting.

Actually, it looks like there may be several cases that are quite
similar to the "getting tired" case that I took an interest in. You
have spgdoinsert(), gistchoose(), and gin_rand()/dropItem(), just for
starters -- those all seem to affect the final structure of an index.
I'm beginning to think that the technique that I came up with to make
"getting tired" deterministic ought to be supporting as a debugging
option if we're to do away with internal use of the generic/seedable
backend PRNG.

-- 
Peter Geoghegan



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Thomas Munro
On Thu, Dec 27, 2018 at 6:48 AM Robert Haas  wrote:
> On Wed, Dec 26, 2018 at 11:43 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > Since it's not fixing a bug, we wouldn't back-patch that into existing
> > > releases.  But I agree that we should do something like this for
> > > PostgreSQL 12, and I think we should make it user configurable.
> >
> > I'm -1 on making this user configurable via a GUC; that adds documentation
> > and compatibility burdens that we don't need, for something of no value
> > to 99.99% of users.  The fact that the default would need to be
> > platform-dependent just makes that tradeoff even worse.  I think the other
> > 0.01% who need to change the default (and are bright enough to be doing
> > the right thing for the right reasons) could certainly handle something
> > like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
> > for a precedent that I think applies well here.  So I'd favor just doing
> > it that way.
>
> I disagree.  I think there is a growing body of evidence that
> b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 killed performance on many
> types of non-Linux systems.  This is the first report I recall about
> AIX, but there have been previous complaints about some BSD variants.

FTR, I think the main FreeBSD complaint in a nutshell was:

1.  There are/were some lock contention problems on internal VM
objects and page tracking structures (pv entries copied for every
process sharing a VM object).
2.  SysV shmem offers a workaround in the form of sysctl
kern.ipc.shm_use_phys which effectively pins all SysV shared memory
(so it shows up in procstat -v as "ph" rather than "sw", ie it's not
swappable).  "Unmanaged pages never require finding all the instances
of their mappings, so the associated data structure used to find all
mappings (Described in Section 6.13) need not be allocated" (from 6.10
in the D&I of FreeBSD book).
3.  Shared anonymous mmap has no way to ask for that.

That wasn't all, though; see the PDF I mentioned up-thread.  I'm
hoping to learn more about this subject, hence my interest in reviving
that patch.  So far I can't reproduce the effect here, probably due to
lack of cores and probably also various changes that have been made
(but not the main ones described in that report, apparently).

-- 
Thomas Munro
http://www.enterprisedb.com



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan  writes:
> I'm beginning to think that the technique that I came up with to make
> "getting tired" deterministic ought to be supporting as a debugging
> option if we're to do away with internal use of the generic/seedable
> backend PRNG.

I have no objection to providing such a thing as a debug option; I just
don't want it to be reachable from SQL (at least not without pieces
that aren't there normally, e.g. a loadable C function).

Replacing random() might actually make that easier not harder, since
we'd have more control over what happens when.

regards, tom lane



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 12:19 PM Tom Lane  wrote:
> Replacing random() might actually make that easier not harder, since
> we'd have more control over what happens when.

That does seem useful. I'm in favor. But why does the function to seed
the internal PRNG have to be loadable? Can't it just be
superuser-only?

-- 
Peter Geoghegan



Re: Feature: temporary materialized views

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-26, Mitar wrote:

> OptTemp seems to have a misleading warning in some cases when it is
> not used on tables though:
> 
> "GLOBAL is deprecated in temporary table creation"
> 
> Should we change this language to something else? "GLOBAL is
> deprecated in temporary object creation"? Based on grammar it seems to
> be used for tables, views, sequences, and soon materialized views.

I'd just leave those messages alone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Dec 26, 2018 at 12:19 PM Tom Lane  wrote:
>> Replacing random() might actually make that easier not harder, since
>> we'd have more control over what happens when.

> That does seem useful. I'm in favor. But why does the function to seed
> the internal PRNG have to be loadable? Can't it just be
> superuser-only?

That'd work as far as I'm concerned.  (But I can hear Frost wanting
a built-in role for it ;-))

One thing we'd have to think about if we want to take this seriously
is whether a process-wide PRNG state is really adequate; if you're
trying to make a particular call site be deterministic, you'd likely
wish it weren't interfered with by other call sites.  On the flip
side, having more call sites probably makes things more random and
thus better for normal usage.  Not sure how to resolve that tension
(but I don't want to build a whole lot of new infrastructure here).

regards, tom lane



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Robert Haas
On Wed, Dec 26, 2018 at 1:57 PM Tom Lane  wrote:
> You misread what I said.  I don't say that we shouldn't fix this;
> what I'm saying is we should not do so via a user-configurable knob.
> We should be able to auto-configure this and just handle it internally.
> I have zero faith in the idea that users would set the knob correctly.

I don't see how you can auto-configure a performance vs. usability
trade-off.  Remember, the original reason why we switched away from
doing a large System V shared memory allocation is that such
allocations tend to fail due to OS limits.  People who have those
limits set high enough will presumably prefer System V shared memory
allocation, while those who do not will prefer anonymous shared memory
over failure to start.  I guess we could try System V and fall back to
anonymous shared memory, but I think that's masterminding.  It risks
masking a performance problem that the user would like to notice and
fix.

Also, I doubt whether this is the only reason to have a GUC for this.
For instance, I seem to recall that there was some discussion of it
possibly being useful to let developers switch back to the
all-System-V approach for reasons that I don't recall at the moment,
and it even seems possible that somebody might want to use POSIX
shared memory so that they can open up the file that gets created and
inspect the contents using arbitrary tools.  I definitely agree that
an average user probably won't have any idea how to configure settings
like this, so we will want to think carefully about what the platform
defaults should be, but I also think that a GUC-less solution is
depriving those users who ARE smart enough to set the GUC the
opportunity to choose the value that works best for them.

Way back in high school somebody gave me a copy of the Camel book, and
it said something along the lines of: A good programming language
makes simple things simple and difficult things possible.  I thought
then, and still think now, that that's a very wise statement, and I
think it also applies to tools other than programming languages, like,
say, databases. Refusing to add a GUC is just deciding that we don't
trust our users with power tools, and that's not a philosophy with
which I can agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up text_position_next with multibyte encodings

2018-12-26 Thread John Naylor
On 12/22/18, Heikki Linnakangas  wrote:
> On 14/12/2018 20:20, John Naylor wrote:
> I'm afraid that script doesn't work as a performance test. The
> position() function is immutable, so the result gets cached in the plan
> cache. All you're measuring is the speed to get the constant from the
> plan cache :-(.

That makes perfect sense now. I should have been more skeptical about
the small and medium sizes having similar times. :/

> I rewrote the same tests with a little C helper function, attached, to
> fix that, and to eliminate any PL/pgSQL overhead.

Thanks for that, I'll probably have occasion to do something like this
for other tests.

> You chose interesting characters for the UTF-8 test. The haystack is a
> repeating pattern of byte sequence EC 99 84, and the needle is a
> repeating pattern of EC 84 B1. In the 'long' test, the lengths in the
> skip table are '2', '1' and '250'. But the search bounces between the
> '2' and '1' cases, and never hits the byte that would allow it to skip
> 250 bytes. Interesting case, I had not realized that that can happen.

Me neither, that was unintentional.

> But I don't think we need to put much weight on that, you could come up
> with other scenarios where the current code has skip table collisions, too.

Okay.

> So overall, I think it's still a worthwhile tradeoff, given that that is
> a worst case scenario. If you choose less pathological UTF-8 codepoints,
> or there is no match or the match is closer to the beginning of the
> string, the patch wins.

On 12/23/18, Tomas Vondra  wrote:
> So, what is the expected speedup in a "good/average" case? Do we have
> some reasonable real-world workload mixing these cases that could be
> used as a realistic benchmark?

I'll investigate some "better" cases.

-John Naylor



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 1:21 PM Tom Lane  wrote:
> One thing we'd have to think about if we want to take this seriously
> is whether a process-wide PRNG state is really adequate; if you're
> trying to make a particular call site be deterministic, you'd likely
> wish it weren't interfered with by other call sites.  On the flip
> side, having more call sites probably makes things more random and
> thus better for normal usage.  Not sure how to resolve that tension
> (but I don't want to build a whole lot of new infrastructure here).

I don't think that you need to resolve the tension -- I'd be fine with
continuing to have a process-wide PRNG that was cordoned-off for
internal use. I care about making behavior deterministic when running
an SQL script within a single backend. This can only be expected to
work when all inputs (non-random and random alike) are carefully
accounted for, including even the mutation of a seed value as the test
case runs. I also found myself disabling synchronized sequential
scanning within the same test case I mentioned, because they caused
false positive failures very occasionally. I can imagining a similar
test case where the tester has to worry about autovacuum running
ANALYZE, too (as it happens, these were all INSERT ... SELECT
statements, where that didn't seem to be an issue).

Trying to reason about the behavior of a call site in isolation seems
way too complicated to be practical. In general, there will never be a
standard way to think about this kind of debugging, and it would be
unreasonable to insist on providing any kind of standard framework.
It's a low-level tool, useful only to backend hackers.

-- 
Peter Geoghegan



Re: Small doc tweak for array/string functions

2018-12-26 Thread Tom Lane
Ian Barwick  writes:
> On these pages:
>   - https://www.postgresql.org/docs/current/functions-array.html
>   - https://www.postgresql.org/docs/current/functions-string.html
> we point out via "See also" the existence of aggregate array and string
> functions, but I think it would be useful to also mention the existence
> of string-related array functions and array-related string (regexp) functions
> respectively.

Hmm.  The existing cross-references there feel a bit ad-hoc to me already,
and the proposed additions even more so.  Surely we don't want to conclude
that every function that takes or returns an array needs to be cited on
the functions-array page; and that idea would be even sillier if applied
to strings.  How can we define a less spur-of-the-moment approach to
deciding what to list?

The patch as shown might be just fine, but I'd like to have some rationale
for which things we're listing or not listing.

regards, tom lane



Re: Move regression.diffs of pg_upgrade test suite

2018-12-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/23/18 10:44 PM, Noah Misch wrote:
>> A disadvantage of any change here is that it degrades buildfarm reports, 
>> which
>> recover slowly as owners upgrade to a fixed buildfarm release.  This will be
>> similar to the introduction of --outputdir=output_iso.  On non-upgraded
>> animals, pg_upgradeCheck failures will omit regression.diffs.
>> 
>> I think the right fix, attached, is to use "pg_regress --outputdir" to
>> redirect these files to src/bin/pg_upgrade/tmp_check/regress.

> Seems reasonable.

Do we need to change anything in the buildfarm client to improve its
response to this?  If so, seems like it might be advisable to make a
buildfarm release with the upgrade before committing the change.
Sure, not all owners will update right away, but if they don't even
have the option then we're not in a good place.

regards, tom lane



Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 09:08:23PM +, Tom Lane wrote:
> Fix failure to check for open() or fsync() failures.
> 
> While it seems OK to not be concerned about fsync() failure for a
> pre-existing signal file, it's not OK to not even check for open()
> failure.  This at least causes complaints from static analyzers,
> and I think on some platforms passing -1 to fsync() or close() might
> trigger assertion-type failures.  Also add (void) casts to make clear
> that we're ignoring fsync's result intentionally.
> 
> Oversights in commit 2dedf4d9a, noted by Coverity.

 fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY | 
get_sync_bit(sync_method),
S_IRUSR | S_IWUSR);
-   pg_fsync(fd);
-   close(fd);
+   if (fd >= 0)
+   {
+   (void) pg_fsync(fd);
+   close(fd);
+   }

Wouldn't it be more simple to remove stat() and just call
BasicOpenFilePerm, complaining with FATAL about any failures,
including EACCES, on the way?  The code is racy as designed, even if
that's not a big deal for recovery purposes.
--
Michael


signature.asc
Description: PGP signature


Re: Move regression.diffs of pg_upgrade test suite

2018-12-26 Thread Noah Misch
On Wed, Dec 26, 2018 at 05:02:37PM -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 12/23/18 10:44 PM, Noah Misch wrote:
> >> A disadvantage of any change here is that it degrades buildfarm reports, 
> >> which
> >> recover slowly as owners upgrade to a fixed buildfarm release.  This will 
> >> be
> >> similar to the introduction of --outputdir=output_iso.  On non-upgraded
> >> animals, pg_upgradeCheck failures will omit regression.diffs.
> >> 
> >> I think the right fix, attached, is to use "pg_regress --outputdir" to
> >> redirect these files to src/bin/pg_upgrade/tmp_check/regress.
> 
> > Seems reasonable.
> 
> Do we need to change anything in the buildfarm client to improve its
> response to this?  If so, seems like it might be advisable to make a
> buildfarm release with the upgrade before committing the change.
> Sure, not all owners will update right away, but if they don't even
> have the option then we're not in a good place.

It would have been convenient if, for each test target, PostgreSQL code
decides the list of interesting log files and presents that list for the
buildfarm client to consume.  It's probably overkill to redesign that now,
though.  I also don't think it's of top importance to have unbroken access to
this regression.diffs, because defects that cause this run to fail will
eventually upset "install-check-C" and/or "check".  Even so, it's fine to
patch the buildfarm client in advance of the postgresql.git change:

diff --git a/PGBuild/Modules/TestUpgrade.pm b/PGBuild/Modules/TestUpgrade.pm
index 19b48b3..dfff17f 100644
--- a/PGBuild/Modules/TestUpgrade.pm
+++ b/PGBuild/Modules/TestUpgrade.pm
@@ -117,11 +117,16 @@ sub check
 @checklog = run_log($cmd);
 }
 
+# Pre-2019 runs could create src/test/regress/regression.diffs.  Its
+# inclusion is a harmless no-op for later runs; if another stage
+# (e.g. make_check()) failed and created that file, the run ends before
+# reaching this stage.
 my @logfiles = glob(
 "$self->{pgsql}/contrib/pg_upgrade/*.log
  $self->{pgsql}/contrib/pg_upgrade/log/*
  $self->{pgsql}/src/bin/pg_upgrade/*.log
  $self->{pgsql}/src/bin/pg_upgrade/log/*
+ $self->{pgsql}/src/bin/pg_upgrade/tmp_check/*/*.diffs
  $self->{pgsql}/src/test/regress/*.diffs"
 );
 foreach my $log (@logfiles)



Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Dec 26, 2018 at 09:08:23PM +, Tom Lane wrote:
>> Fix failure to check for open() or fsync() failures.
>> 
>> While it seems OK to not be concerned about fsync() failure for a
>> pre-existing signal file, it's not OK to not even check for open()
>> failure.  This at least causes complaints from static analyzers,

> Wouldn't it be more simple to remove stat() and just call
> BasicOpenFilePerm, complaining with FATAL about any failures,
> including EACCES, on the way?  The code is racy as designed, even if
> that's not a big deal for recovery purposes.

It appears to me that the code is intentionally not worrying about
fsync failure, so it seems wrong for it to FATAL out if it's unable
to open the file to fsync it.  And it surely shouldn't do so if the
file isn't there.

regards, tom lane



RE: removal of dangling temp tables

2018-12-26 Thread Tsunakawa, Takayuki
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
> The more aggressive action is to backpatch 943576bddcb5 ("Make autovacuum
> more aggressive to remove orphaned temp tables") which is currently only
> in pg11.  We would put the new PGPROC member at the end of the struct, to
> avoid ABI incompatibilities, but it'd bring trouble for extensions that
> put PGPROC in arrays.  I checked the code of some known extensions; found
> that pglogical uses PGPROC, but only pointers to it, so it wouldn't be
> damaged by the proposed change AFAICS.

+1
I think this is a bug from a user's perspective that garbage is left.  I want 
to believe that fixing bugs of PostgreSQL itself are prioritized over the ABI 
compatibility for extensions, if we have to choose one of them.



> Another possibly useful change is to make DISCARD ALL and DISCARD TEMP delete
> everything in what would be the backend's temp namespace, even if it hasn't
> been initialized yet.  This would cover the case where a connection pooler
> keeps the connection open for a very long time, which I think is a common
> case.

That sounds good.


Regards
Takayuki Tsunakawa





Re: removal of dangling temp tables

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-26, Tsunakawa, Takayuki wrote:

> From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
> > The more aggressive action is to backpatch 943576bddcb5 ("Make autovacuum
> > more aggressive to remove orphaned temp tables") which is currently only
> > in pg11.  We would put the new PGPROC member at the end of the struct, to
> > avoid ABI incompatibilities, but it'd bring trouble for extensions that
> > put PGPROC in arrays.  I checked the code of some known extensions; found
> > that pglogical uses PGPROC, but only pointers to it, so it wouldn't be
> > damaged by the proposed change AFAICS.
> 
> +1
> I think this is a bug from a user's perspective that garbage is left.
> I want to believe that fixing bugs of PostgreSQL itself are
> prioritized over the ABI compatibility for extensions, if we have to
> choose one of them.

Having been victim of ABI incompatibility myself, I loathe giving too
much priority to other issues that can be resolved in other ways, so I
don't necessarily support your view on bugs.
That said, I think in this case it shouldn't be a problem, so I'm going
to work on that next.

I haven't got around to creating the abidiff reporting system yet ...

> > Another possibly useful change is to make DISCARD ALL and DISCARD TEMP 
> > delete
> > everything in what would be the backend's temp namespace, even if it hasn't
> > been initialized yet.  This would cover the case where a connection pooler
> > keeps the connection open for a very long time, which I think is a common
> > case.
> 
> That sounds good.

Thanks.  I just tested that the attached patch does the intended.  (I
named it "autovac" but obviously the DISCARD part is not about
autovacuum).  I intend to push this patch first, and later backpatch the
other one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 89df585b87..d6f3fd5436 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3835,12 +3835,31 @@ RemoveTempRelationsCallback(int code, Datum arg)
 
 /*
  * Remove all temp tables from the temporary namespace.
+ *
+ * If we haven't set up one yet, but one exists from a previous crashed
+ * backend, clean that one; but only do this once in a session's life.
  */
 void
 ResetTempTableNamespace(void)
 {
+	static bool	TempNamespaceCleaned = false;
+
 	if (OidIsValid(myTempNamespace))
 		RemoveTempRelations(myTempNamespace);
+	else if (MyBackendId != InvalidBackendId && !RecoveryInProgress() &&
+			 !TempNamespaceCleaned)
+	{
+		char		namespaceName[NAMEDATALEN];
+		Oid			namespaceId;
+
+		snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
+ MyBackendId);
+		namespaceId = get_namespace_oid(namespaceName, true);
+		if (OidIsValid(namespaceId))
+			RemoveTempRelations(namespaceId);
+	}
+
+	TempNamespaceCleaned = true;
 }
 
 
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c18285d690..62a06add56 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2108,11 +2108,18 @@ do_autovacuum(void)
 		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 		{
 			int			backendID;
+			PGPROC	   *proc;
 
 			backendID = GetTempNamespaceBackendId(classForm->relnamespace);
 
-			/* We just ignore it if the owning backend is still active */
-			if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL)
+			/*
+			 * We just ignore it if the owning backend is still active in the
+			 * same database.
+			 */
+			if (backendID != InvalidBackendId &&
+(backendID == MyBackendId ||
+ (proc = BackendIdGetProc(backendID)) == NULL ||
+ proc->databaseId != MyDatabaseId))
 			{
 /*
  * We found an orphan temp table (which was probably left


Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Thomas Munro
On Thu, Dec 27, 2018 at 8:59 AM Thomas Munro
 wrote:
> ... So far I can't reproduce the effect here, probably due to
> lack of cores and probably also various changes that have been made
> (but not the main ones described in that report, apparently).

I decided to blow today's coffee money on a couple of hours on a 40
CPU m4.x10large at the Amazon arcade, running "FreeBSD
12.0-RELEASE-amd64-f5af2713-6d84-41f0-95ea-2eaee78f2af4-ami-03b0f822e17669866.4
(ami-0ec55ef66da2c9765)" on a 20GB SSD with UFS.  Some time when there
isn't a commitfest looming I'll try to do some proper testing, but in
a quick and dirty smoke test (-s 600, -S -T 60, shared_buffers=6GB,
for a range of client counts), mmap and sysv were the same, but there
did seem to be a measurable speed-up available at high client counts
with kern.ipc.shm_use_phys=1 and thus for sysv only, for people
prepared to set 3 sysctls and this proposed new GUC.  Maybe the effect
would be greater with bigger shared_buffers or smaller pages (this
test ran on super pages)?  More work required to figure that out.

Graph attached.

clients mmapsysvsysv+ph
=== === === ===
  1   13341   13277   13600
  3   39262   39431   39504
  6   78456   78702   78717
  9  116281  116356  116349
 12  149433  149284  149361
 15  169096  169224  169903
 18  177632  14  178177
 24  202376  202574  203843
 32  227606  229967  231142
 48  291641  292431  302510
 64  287625  287118  298857
 80  277412  278449  293781

-- 
Thomas Munro
http://www.enterprisedb.com


RE: removal of dangling temp tables

2018-12-26 Thread Tsunakawa, Takayuki
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
> Having been victim of ABI incompatibility myself, I loathe giving too much
> priority to other issues that can be resolved in other ways, so I don't
> necessarily support your view on bugs.
> That said, I think in this case it shouldn't be a problem, so I'm going
> to work on that next.

I understood your sad experience...

> Thanks.  I just tested that the attached patch does the intended.  (I named
> it "autovac" but obviously the DISCARD part is not about autovacuum).  I
> intend to push this patch first, and later backpatch the other one.

The patch looks good.  I think this can be committed.

Please don't mind replying to this mail.


Regards
Takayuki Tsunakawa



Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 05:55:36PM -0500, Tom Lane wrote:
> It appears to me that the code is intentionally not worrying about
> fsync failure, so it seems wrong for it to FATAL out if it's unable
> to open the file to fsync it.  And it surely shouldn't do so if the
> file isn't there.

My point is a bit different though: it seems to me that we could just
call BasicOpenFilePerm() and remove the stat() to do exactly the same
things, simplifying the code.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:
>> It adds an (now mandatory) --action parameter that takes either verify,
>> enable or disable as argument.
> 
> I'd rather have explicit switches for verify, enable & disable, and verify
> would be the default if none is provided.

Okay, noted for the separate switches.  But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined.  That feels like a trap for newcomers of this tool..

> For enable/disable, while the command is running, it should mark the cluster
> as opened to prevent an unwanted database start. I do not see where this is
> done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance.  In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool.  I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Dec 26, 2018 at 05:55:36PM -0500, Tom Lane wrote:
>> It appears to me that the code is intentionally not worrying about
>> fsync failure, so it seems wrong for it to FATAL out if it's unable
>> to open the file to fsync it.  And it surely shouldn't do so if the
>> file isn't there.

> My point is a bit different though: it seems to me that we could just
> call BasicOpenFilePerm() and remove the stat() to do exactly the same
> things, simplifying the code.

Oh, I see.  Yeah, if we're ignoring errors anyway, the stat calls
seem redundant.

regards, tom lane



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 01:45:00PM -0500, Tom Lane wrote:
> I am not sure I buy the argument that this is a security hazard, but
> there are other reasons to question the use of random() here, some of
> which you stated yourself above.  I wonder whether we should
> establish a project policy to avoid use of random() for internal
> purposes, ie try to get to a point where drandom() is the only
> caller in the backend.

Agreed for all three points.

> A quick grep says that there's a dozen or so callers, so this patch
> certainly isn't the only offender ... but should we make an effort
> to convert them all to use, say, pg_erand48()?  I think all the
> existing callers  could happily share a process-wide random state,
> so we could make a wrapper that's no harder to use than random().

Another possibility would be to extend a bit more the use of
pg_strong_random(), though it is designed to really be used in cases
like authentication where the random bytes are strong for
cryptography.  pg_erand48() would be a good step forward.
--
Michael


signature.asc
Description: PGP signature


Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Dec 26, 2018 at 01:45:00PM -0500, Tom Lane wrote:
>> A quick grep says that there's a dozen or so callers, so this patch
>> certainly isn't the only offender ... but should we make an effort
>> to convert them all to use, say, pg_erand48()?  I think all the
>> existing callers  could happily share a process-wide random state,
>> so we could make a wrapper that's no harder to use than random().

> Another possibility would be to extend a bit more the use of
> pg_strong_random(), though it is designed to really be used in cases
> like authentication where the random bytes are strong for
> cryptography.  pg_erand48() would be a good step forward.

I think pg_strong_random is overkill, and overly expensive, for
most if not all of the existing callers of random().  We already
changed the ones where it's important to be strong ...

One thing I was wondering is if we should try to enforce a policy
like this by putting, say,

#define random() pg_random()

into port.h or so.  That would have the advantages of not having to touch
any existing calls and not having to worry too much about future patches
breaking the policy.  On the other hand, it's conceivable that third-party
extensions might get annoyed with us for hijacking a libc function.
Thoughts?

regards, tom lane



[information] Winter vacation

2018-12-26 Thread Nagaura, Ryohei
Hi all.

-Information-
Members of Fujitsu Japan may not be able to reply in the term below.
TERM: 29th December 2018 ~ 6th January 2019

NOTE:
Members of Fujitsu Japan are those whose mail domain is "@jp.fujitsu.com"

Best regards,
-
Ryohei Nagaura





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 03:19:06PM -0300, Alvaro Herrera wrote:
> As for REINDEX, I think it's valuable to move tablespace together with
> the reindexing.  You can already do it with the CREATE INDEX
> CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
> not going to provide that, and it seems worth doing.

Even for plain REINDEX that seems useful.
--
Michael


signature.asc
Description: PGP signature


Re: GIN predicate locking slows down valgrind isolationtests tremendously

2018-12-26 Thread Alexander Korotkov
On Tue, Dec 25, 2018 at 1:25 AM Alexander Korotkov
 wrote:
> On Fri, Dec 21, 2018 at 1:50 AM Alexander Korotkov
>  wrote:
> > чт, 20 дек. 2018 г., 2:22 Andres Freund and...@anarazel.de:
> >> On 2018-12-03 16:07:40 -0800, Andres Freund wrote:
> >> > As far as I can tell that increase comes laregely from the new GIN
> >> > tests.  Could one of you please look at keeping the test time increase
> >> > to something more reasonable?
> >>
> >> Ping?
> >>
> >> It's also one of the slowest tests outside of valgrind...
> >
> > I'm going to take a look on that.
>
> BTW, patch for reducing isolation testing for gin predicate locking is
> attached.  Could anybody check its execution time with valgrind (and
> compare with execution time of previous test suite)?

I've managed to run it myself.  I've added following lines to
src/test/isolation/Makefile to make it handy running these tests under
valgrind.

ifdef VALGRIND
override with_temp_install += PGCTLTIMEOUT=600 \
valgrind --leak-check=no --gen-suppressions=all \
--suppressions=../../tools/valgrind.supp --time-stamp=yes \
--log-file=pid-%p.log --trace-children=yes \
--trace-children-skip=*/initdb
endif

With only predicate-gin in isolation_schedule patched version takes
less than minute.

real 0m53.866s
user 0m9.472s
sys 0m1.706s

I also run unpatched version of predicate-gin test, but cancel that
after 30 minutes of waiting.  So, it appears that patch solves the
problem.  I'm going to commit it if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 08:46:25PM -0500, Tom Lane wrote:
> One thing I was wondering is if we should try to enforce a policy
> like this by putting, say,
> 
> #define random() pg_random()
> 
> into port.h or so.  That would have the advantages of not having to touch
> any existing calls and not having to worry too much about future patches
> breaking the policy.  On the other hand, it's conceivable that third-party
> extensions might get annoyed with us for hijacking a libc function.
> Thoughts?

Not much a fan of that for random() to be honest as we are talking
about 15 callers in the backend code and enforcing a call of in a
low-level library..
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 08:35:22PM -0500, Tom Lane wrote:
> Oh, I see.  Yeah, if we're ignoring errors anyway, the stat calls
> seem redundant.

For this one, I think that we could simplify as attached (this causes
open() to fail additionally because of the sync flags, but that's not
really worth worrying).  Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5729867879..34ebf6716e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5307,6 +5307,7 @@ static void
 readRecoverySignalFile(void)
 {
 	struct stat stat_buf;
+	int			fd;
 
 	if (IsBootstrapProcessingMode())
 		return;
@@ -5333,30 +5334,20 @@ readRecoverySignalFile(void)
 	 * If present, standby signal file takes precedence. If neither is present
 	 * then we won't enter archive recovery.
 	 */
-	if (stat(STANDBY_SIGNAL_FILE, &stat_buf) == 0)
+	if ((fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE,
+O_RDWR | PG_BINARY | get_sync_bit(sync_method),
+S_IRUSR | S_IWUSR)) >= 0)
 	{
-		int			fd;
-
-		fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
-			   S_IRUSR | S_IWUSR);
-		if (fd >= 0)
-		{
-			(void) pg_fsync(fd);
-			close(fd);
-		}
+		(void) pg_fsync(fd);
+		close(fd);
 		standby_signal_file_found = true;
 	}
-	else if (stat(RECOVERY_SIGNAL_FILE, &stat_buf) == 0)
+	else if ((fd = BasicOpenFilePerm(RECOVERY_SIGNAL_FILE,
+O_RDWR | PG_BINARY | get_sync_bit(sync_method),
+S_IRUSR | S_IWUSR)) >= 0)
 	{
-		int			fd;
-
-		fd = BasicOpenFilePerm(RECOVERY_SIGNAL_FILE, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
-			   S_IRUSR | S_IWUSR);
-		if (fd >= 0)
-		{
-			(void) pg_fsync(fd);
-			close(fd);
-		}
+		(void) pg_fsync(fd);
+		close(fd);
 		recovery_signal_file_found = true;
 	}
 


signature.asc
Description: PGP signature


Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 5:46 PM Tom Lane  wrote:
> I think pg_strong_random is overkill, and overly expensive, for
> most if not all of the existing callers of random().  We already
> changed the ones where it's important to be strong ...

+1.

There was a controversy a bit like this in the Python community a few
years ago [1]. I don't think you can trust somebody to write Postgres
backend code but not trust them to understand the security issues with
a fast user-space PRNG (I think that I'd be willing to say the same
thing about people that write Python programs of any consequence).

It's always possible to make a change that might stop someone from
introducing a bug. The question ought to be: why this change, and why
now?

[1] https://lwn.net/Articles/657269/
-- 
Peter Geoghegan



Re: pgsql: Fix failure to check for open() or fsync() failures.

2018-12-26 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Dec 26, 2018 at 08:35:22PM -0500, Tom Lane wrote:
>> Oh, I see.  Yeah, if we're ignoring errors anyway, the stat calls
>> seem redundant.

> For this one, I think that we could simplify as attached (this causes
> open() to fail additionally because of the sync flags, but that's not
> really worth worrying).  Thoughts?

Actually, now that I think a bit more, this isn't a good idea.  We want
standby_signal_file_found (resp. recovery_signal_file_found) to get set
if the file exists, even if we're unable to fsync it for some reason.
A counterexample to the proposed patch is that a signal file that's
read-only to the server will get ignored, which it should not be.

regards, tom lane



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan  writes:
> It's always possible to make a change that might stop someone from
> introducing a bug. The question ought to be: why this change, and why
> now?

The point here is not to be cryptographically strong at every single
place where the backend might want a random number; I think we're
all agreed that we don't need that.  To me, the point is to ensure that
the user-accessible random sequence is kept separate from internal uses,
and the potential security exposure in the new random-logging patch is
what justifies getting more worried about this than we were before.

Now, we could probably fix that with some less intrusive patch than
#define'ing random() --- in particular, if we give drandom and setseed
their own private PRNG state, we've really fixed the security exposure
without need to change anything else anywhere.  So maybe we should
just do that and be happy.

regards, tom lane



Re: [PATCH] kNN for btree

2018-12-26 Thread Alexander Korotkov
Hi!

On Fri, Nov 30, 2018 at 3:02 PM Nikita Glukhov  wrote:
> On 29.11.2018 18:24, Dmitry Dolgov wrote:
> >> On Wed, Sep 26, 2018 at 5:41 PM Nikita Glukhov  
> >> wrote:
> >>
> >> Attached 3rd version of the patches rebased onto the current master.
> >>
> >> Changes from the previous version:
> >> - Added support of INCLUDE columns to get_index_column_opclass() (1st 
> >> patch).
> >> - Added parallel kNN scan support.
> >> - amcanorderbyop() was transformed into ammatchorderby() which takes a 
> >> List of
> >>PathKeys and checks each of them with new function 
> >> match_orderbyop_pathkey()
> >>extracted from match_pathkeys_to_index().  I think that this design can 
> >> be
> >>used in the future to support a mix of ordinary and order-by-op 
> >> PathKeys,
> >>but I am not sure.
> > Hi,
> >
> > Unfortunately, the patch has some conflicts, could you rebase it? In the
> > meantime I'll move it to the next CF, hoping to have more reviewers for this
> > item.
>
> Attached 4th version of the patches rebased onto the current master.

I think this patchset in general has a good shape.  After some rounds
of review, it might be committed during January commitfest.

For now, I have following notes.

* 0002-Introduce-ammatchorderby-function-v04.patch

I think match_orderbyop_pathkey() and match_orderbyop_pathkeys()
deserve some high-level commends describing what these functions are
expected to do.

* 0004-Add-kNN-support-to-btree-v04.patch

+ 
+  FIXME!!!
+  To implement the distance ordered (nearest-neighbor) search, we only need
+  to define a distance operator (usually it called <->) with a
correpsonding
+  operator family for distance comparison in the index's operator class.
+  These operators must satisfy the following assumptions for all non-null
+  values A,B,C of the datatype:
+
+  A <-> B = B <-> A symmetric law
+  if A = B, then A <-> C = B <-> C distance equivalence
+  if (A <= B and B <= C) or (A >= B and B >= C),
+  then A <-> B <= A <-> C monotonicity
+ 

What exactly you're going to fix here?  I think you at least should
provide a proper formatting to this paragraph

* 0006-Remove-distance-operators-from-btree_gist-v04.patch

I see you provide btree_gist--1.6.sql and remove btree_gist--1.2.sql.
Note, that in order to better checking of extension migrations, we're
now providing just migration script to new version.  So, everybody
installing new version will go through the migration.  However, in
this particular case we've mass deletion of former extension objects.
So, I think this case should be an exception to the rules.  And it's
good to provide new version of extension script in this case.  Other
opinions?

A see  btree_gist--1.5--1.6.sql contains a sophisticated query
updating extension operators to builtin operators.  However, what do
you think about just long sequence of ALTER OPERATOR FAMILY commands
removing old operators and adding new operators?  It would be longer,
but more predictable and easier for understanding.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 6:39 PM Tom Lane  wrote:
> The point here is not to be cryptographically strong at every single
> place where the backend might want a random number; I think we're
> all agreed that we don't need that.  To me, the point is to ensure that
> the user-accessible random sequence is kept separate from internal uses,
> and the potential security exposure in the new random-logging patch is
> what justifies getting more worried about this than we were before.

I agree that that's the point here.

> Now, we could probably fix that with some less intrusive patch than
> #define'ing random() --- in particular, if we give drandom and setseed
> their own private PRNG state, we've really fixed the security exposure
> without need to change anything else anywhere.  So maybe we should
> just do that and be happy.

+1. I don't like the idea of #define'ing random() myself.

We're already making fairly broad assumptions about our having control
of the backend's PRNG state within InitProcessGlobals(). How should
this affect the new drandom()/setseed() private state, if at all?

-- 
Peter Geoghegan



Re: removal of dangling temp tables

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 08:51:56PM -0300, Alvaro Herrera wrote:
> Having been victim of ABI incompatibility myself, I loathe giving too
> much priority to other issues that can be resolved in other ways, so I
> don't necessarily support your view on bugs.
> That said, I think in this case it shouldn't be a problem, so I'm going
> to work on that next.

And it is even better if bugs can be fixed, even partially without any
ABI breakages.  Anyway, not breaking the ABI of PGPROC is why 246a6c8
has not been back-patched to begin with, because we have no idea how
PGPROC is being used and because its interface is public, so if the
intent is to apply 246a6c8 to v10 and down this gets a -1 from me.

Back-patching what you sent in
https://www.postgresql.org/message-id/20181226190834.wsk2wzott5yzrjiq@alvherre.pgsql
is fine for me.

>>> Another possibly useful change is to make DISCARD ALL and DISCARD TEMP 
>>> delete
>>> everything in what would be the backend's temp namespace, even if it hasn't
>>> been initialized yet.  This would cover the case where a connection pooler
>>> keeps the connection open for a very long time, which I think is a common
>>> case.
>> 
>> That sounds good.
> 
> Thanks.  I just tested that the attached patch does the intended.  (I
> named it "autovac" but obviously the DISCARD part is not about
> autovacuum).  I intend to push this patch first, and later backpatch the
> other one.

+   snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
+MyBackendId);
+   namespaceId = get_namespace_oid(namespaceName, true);

So this is the reverse engineering of GetTempNamespaceBackendId().
Perhaps a dedicated API in namespace.c would be interesting to have?

I would recommend to keep the changes for DISCARD and autovacuum into
separate commits.
--
Michael


signature.asc
Description: PGP signature


Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Thomas Munro
On Thu, Dec 27, 2018 at 3:55 PM Peter Geoghegan  wrote:
> On Wed, Dec 26, 2018 at 6:39 PM Tom Lane  wrote:
> > The point here is not to be cryptographically strong at every single
> > place where the backend might want a random number; I think we're
> > all agreed that we don't need that.  To me, the point is to ensure that
> > the user-accessible random sequence is kept separate from internal uses,
> > and the potential security exposure in the new random-logging patch is
> > what justifies getting more worried about this than we were before.
>
> I agree that that's the point here.

+1, but I wonder if just separating them is enough.  Is our seeding
algorithm good enough for this new purpose?  The initial seed is 100%
predictable to a logged in user (it's made from the backend PID and
backend start time, which we tell you), and not even that hard to
guess from the outside, so I think Coverity's warning is an
understatement in this case.  Even if we separate the PRNG state used
for internal stuff so that users can't clobber its seed from SQL,
wouldn't it be possible to predict which statements will survive the
log sampling filter given easily available information and a good
guess at how many times random() (or whatever similar thing) has been
called so far?

> > Now, we could probably fix that with some less intrusive patch than
> > #define'ing random() --- in particular, if we give drandom and setseed
> > their own private PRNG state, we've really fixed the security exposure
> > without need to change anything else anywhere.  So maybe we should
> > just do that and be happy.
>
> +1. I don't like the idea of #define'ing random() myself.

+1, me neither.

-- 
Thomas Munro
http://www.enterprisedb.com



RE: Cache relation sizes?

2018-12-26 Thread Jamison, Kirk
Hello,

I also find this proposed feature to be beneficial for performance, especially 
when we want to extend or truncate large tables.
As mentioned by David, currently there is a query latency spike when we make 
generic plan for partitioned table with many partitions.
I tried to apply Thomas' patch for that use case. Aside from measuring the 
planning and execution time,
I also monitored the lseek calls using simple strace, with and without the 
patch.

Below are the test results.
Setup 8192 table partitions.
(1) set plan_cache_mode = 'force_generic_plan';

  [Without Patch]
prepare select_stmt(int) as select * from t where id = $1;
explain (timing off, analyze) execute select_stmt(8192);
[…]
Planning Time: 1678.680 ms
Execution Time: 643.495 ms

$ strace -p [pid] -e trace=lseek -c
% timeseconds  usecs/call  calls   errors   syscall
---
100.000.017247  1 16385  lseek

  [With Patch]
[…]
Planning Time: 1596.566 ms
Execution Time: 653.646 ms

$ strace -p [pid] -e trace=lseek -c
% timeseconds  usecs/call  calls   errors   syscall
---
100.000.009196  1 8192   lseek

It was mentioned in the other thread [1] that creating a generic plan for the 
first time is very expensive.
Although this patch did not seem to reduce the cost of planning time for 
force_generic_plan,
it seems that number of lseek calls was reduced into half during the first 
execution of generic plan.


(2) plan_cache_mode = 'auto’
reset plan_cache_mode; -- resets to auto / custom plan

  [Without Patch]
[…]
Planning Time: 768.669 ms
Execution Time: 0.776 ms

$ strace -p [pid] -e trace=lseek -c
% timeseconds  usecs/call  calls   errors   syscall
---
100.000.015117 2   8193 lseek

  [With Patch]
[…]
Planning Time: 181.690 ms
Execution Time: 0.615 ms

$ strace -p [pid] -e trace=lseek -c
[…]
NO (zero) lseek calls.

Without the patch, there were around 8193 lseek calls.
With the patch applied, there were no lseek calls when creating the custom plan.


(3) set plan_cache_mode = 'force_generic_plan';
-- force it to generic plan again to use the cached plan (no re-planning)

  [Without Patch]
[…]
Planning Time: 14.294 ms
Execution Time: 601.141 ms

$ strace -p [pid] -e trace=lseek -c
% timeseconds  usecs/call  calls   errors   syscall
---
0.000.0001  lseek

  [With Patch]
[…]
Planning Time: 13.976 ms
Execution Time: 570.518 ms

$ strace -p [pid] -e trace=lseek -c
[…]
NO (zero) lseek calls.


If I did the test correctly, I am not sure though as to why the patch did not 
affect the generic planning performance of table with many partitions.
However, the number of lseek calls was greatly reduced with Thomas’ patch.
I also did not get considerable speed up in terms of latency average using 
pgbench –S (read-only, unprepared).
I am assuming this might be applicable to other use cases as well.
(I just tested the patch, but haven’t dug up the patch details yet).

Would you like to submit this to the commitfest to get more reviews for 
possible idea/patch improvement?


[1] 
https://www.postgresql.org/message-id/flat/CAEepm%3D3SSw-Ty1DFcK%3D1rU-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com


Regards,
Kirk Jamison


RE: Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot

2018-12-26 Thread Kato, Sho
Hi, Amit

Thank you for your reply.

> What do you mean by "since the partitions to access are partial"?

I mean planner create scan nodes based on the parameters specified for EXECUTE 
and backend keep them in CachedPlan.
If CachedPlan does not have a scan node for accessing partition, planning is 
needed.
But if there are a lot of partitions and EXEUCTE is executed several times, 
planning will not be needed because EXECUTE probably access to some partitions 
in most case.

I'm sorry that I do not understand the mechanism so much, so I do not know if I 
can do it.
This is idea.

Before:

postgres=# explain execute update_stmt(8);
 QUERY PLAN  
-
 Update on t  (cost=0.00..382.78 rows=110 width=14)
   Update on t_1
   Update on t_2
   Update on t_3
   Update on t_4
   Update on t_5
   Update on t_6
   Update on t_7
   Update on t_8
   Update on t_9
   Update on t_10
   ->  Seq Scan on t_1  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_2  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_3  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_4  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_5  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_6  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_7  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_8  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_9  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
   ->  Seq Scan on t_10  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)

After:

postgres=# explain execute update_stmt(8);
 QUERY PLAN  
-
 Update on t  (cost=0.00..382.78 rows=110 width=14)
   Update on t_8
   ->  Seq Scan on t_8  (cost=0.00..38.28 rows=11 width=14)
 Filter: (aid = $1)
 
regards,
> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Friday, December 21, 2018 5:45 PM
> To: Kato, Sho/加藤 翔 ;
> pgsql-hackers@lists.postgresql.org
> Subject: Re: Speeding up creating UPDATE/DELETE generic plan for
> partitioned table into a lot
> 
> Kato-san,
> 
> On 2018/12/21 15:36, Kato, Sho wrote:
> > Hi,
> > I want to speed up the creation of UPDATE/DELETE generic plan for tables
> partitioned into a lot.
> >
> > Currently, creating a generic plan of UPDATE/DELTE for such table,
> planner creates a plan to scan all partitions.
> > So it takes a very long time.
> > I tried with a table partitioned into 8192, it took 12 seconds.
> >
> > In most cases, since the partitions to access are partial, I think
> planner does not need to create a Scan path for every partition.
> 
> What do you mean by "since the partitions to access are partial"?
> 
> > Is there any better way? For example, can planner create generic plans
> from the parameters specified for EXECUTE?
> 
> Well, a generic plan is, by definition, *not* specific to the values of
> parameters, so it's not clear what you're suggesting here.
> 
> Thanks,
> Amit
> 
> 





Re: [WIP] CREATE SUBSCRIPTION with FOR TABLES clause (table filter)

2018-12-26 Thread Evgeniy Efimkin
Hello!
In latest patch i removed `FOR ALL TABLES` clause and `alltables` parameter, 
now it's look more simple.
Add new system view pg_user_subscirption to allow non-superuser use pg_dump and 
select addition column from pg_subscrption
Changes docs.
Thanks!

 
Ефимкин Евгений

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 3f2f674a1a..ff8a65a3e4 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -522,12 +522,8 @@
   
 
   
-   To create a subscription, the user must be a superuser.
-  
-
-  
-   The subscription apply process will run in the local database with the
-   privileges of a superuser.
+   To add tables to a subscription, the user must have ownership rights on the
+   table.
   
 
   
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 6dfb2e4d3e..f0a368f90c 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -24,6 +24,9 @@ PostgreSQL documentation
 ALTER SUBSCRIPTION name CONNECTION 'conninfo'
 ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]
+ALTER SUBSCRIPTION name ADD TABLE table_name [, ...]
+ALTER SUBSCRIPTION name SET TABLE table_name [, ...]
+ALTER SUBSCRIPTION name DROP TABLE table_name [, ...]
 ALTER SUBSCRIPTION name ENABLE
 ALTER SUBSCRIPTION name DISABLE
 ALTER SUBSCRIPTION name SET ( subscription_parameter [= value] [, ... ] )
@@ -44,9 +47,7 @@ ALTER SUBSCRIPTION name RENAME TO <
   
You must own the subscription to use ALTER SUBSCRIPTION.
To alter the owner, you must also be a direct or indirect member of the
-   new owning role. The new owner has to be a superuser.
-   (Currently, all subscription owners must be superusers, so the owner checks
-   will be bypassed in practice.  But this might change in the future.)
+   new owning role.
   
  
 
@@ -137,6 +138,35 @@ ALTER SUBSCRIPTION name RENAME TO <
 

 
+   
+ADD TABLE table_name
+
+ 
+  The ADD TABLE clauses will add new table in subscription, table must be
+  present in publication.
+ 
+
+   
+
+   
+SET TABLE table_name
+
+ 
+  The SET TABLE clause will replace the list of tables in
+  the publication with the specified one.
+ 
+
+   
+
+   
+DROP TABLE table_name
+
+ 
+   The DROP TABLE clauses will remove table from subscription.
+ 
+
+   
+

 ENABLE
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 1a90c244fb..04af4e27c7 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -24,6 +24,7 @@ PostgreSQL documentation
 CREATE SUBSCRIPTION subscription_name
 CONNECTION 'conninfo'
 PUBLICATION publication_name [, ...]
+[ FOR TABLE table_name [, ...]
 [ WITH ( subscription_parameter [= value] [, ... ] ) ]
 
  
@@ -88,6 +89,16 @@ CREATE SUBSCRIPTION subscription_name

 
+   
+FOR TABLE
+
+ 
+  Specifies a list of tables to add to the subscription. All tables listed in clause
+  must be present in publication.
+ 
+
+   
+

 WITH ( subscription_parameter [= value] [, ... ] )
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5253837b54..a23b080cfe 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -904,6 +904,27 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_user_subscription AS
+SELECT
+S.oid,
+		S.subdbid,
+S.subnameAS subname,
+CASE WHEN S.subowner = 0 THEN
+'public'
+ELSE
+A.rolname
+END AS usename,
+		S.subenabled,
+CASE WHEN (S.subowner <> 0 AND A.rolname = current_user)
+OR (SELECT rolsuper FROM pg_authid WHERE rolname = current_user)
+THEN S.subconninfo
+ ELSE NULL END AS subconninfo,
+		S.subslotname,
+		S.subsynccommit,
+		S.subpublications
+FROM pg_subscription S
+LEFT JOIN pg_authid A ON (A.oid = S.subowner);
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
@@ -936,7 +957,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublications)
+GRANT SELECT (tableoid, oid, subdbid, subname,
+	subowner, subenabled, subslotname, subpublications, subsynccommit)
 ON pg_subscription TO public;
 
 
diff --git a/src/backend/commands/su

Re: chained transactions

2018-12-26 Thread Fabien COELHO


Hello Peter,


Shouldn't psql tab completion be updated as well?


Done.  (I only did the AND CHAIN, not the AND NO CHAIN.)


Sure, that is the useful part.


I must admit that do not like much the three global variables &
Save/Restore functions. I'd suggest saving directly into 3 local variables
in function CommitTransactionCommand, and restoring them when needed. Code
should not be longer. I'd would not bother to skip saving when not
chaining.


We're also using those functions in the 0002 patch.


Hmmm. I have not looked at the second one yet.


Should we just repeat the code three times, or use macros?


I try to avoid global variables when possible as a principle, because I 
paid to learn that they are bad:-) Maybe I'd do a local struct, declare an 
instance within the function, and write two functions to dump/restore the 
transaction status variables into the referenced struct. Maybe this is not 
worth the effort.



Copying & comparing nodes are updated. Should making, outing and reading
nodes also be updated?


TransactionStmt isn't covered by the node serialization functions, so I
didn't see anything to update.  What did you have in mind?


Sigh. I had in mind that the serialization feature would work with all 
possible nodes, not just some of them… which seems quite naïve. The whole 
make/copy/cmp/in/out functions depress me, all this verbose code should be 
automatically generated from struct declarations. I'm pretty sure there 
are hidden bugs in there.



About the tests: I'd suggest to use more options on the different tests,
eg SERIALIZABLE, READ ONLY… Also ISTM that tests should show
transaction_read_only value as well.


OK, updated a bit.  (Using READ ONLY doesn't work because then you can't
write anything inside the transaction.)


Sure. Within a read-only tx, it could check that transaction_read_only is 
on, and still on when chained, though.



Updated patch attached.


First patch applies cleanly, compiles, make check ok.

Further remarks, distinct from the above comments and suggestions:

The documentation should probably tell in the compatibility sections that 
AND CHAIN is standard conforming.


In the automatic rollback tests, maybe I'd insert 3 just once, and use 
another value for the chained transaction.


Tests may eventually vary BEGIN/START, COMMIT/END, ROLLBACK/ABORT.

As you added a SAVEPOINT, maybe I'd try rollbacking to it.

--
Fabien.

Re: Feature: triggers on materialized views

2018-12-26 Thread Mitar
Hi!

I did a bit of benchmarking. It seems my version with UPDATE takes
even slightly less time (~5%).


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar  wrote:
>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar  wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter  wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: chained transactions

2018-12-26 Thread Fabien COELHO




Updated patch attached.  The previous (v2) patch apparently didn't apply
anymore.


Second patch applies cleanly, compiles, "make check" ok.

As I do not know much about the SPI stuff, some of the comments below may 
be very stupid.


I'm wary of changing the SPI_commit and SPI_rollback interfaces which are 
certainly being used outside the source tree and could break countless 
code, and it seems quite unclean that commit and rollback would do 
anything else but committing or rollbacking.


ISTM that it should be kept as is and only managed from the PL/pgsql 
exec_stmt_* functions, which have to be adapted anyway. That would 
minimise changes and not break existing code.


If SPI_* functions are modified, which I would advise against, I find 
keeping the next assignment in the chained case doubtful:


_SPI_current->internal_xact = false;

--
Fabien.



Re: chained transactions

2018-12-26 Thread Fabien COELHO




Updated patch attached.  The previous (v2) patch apparently didn't apply
anymore.


Second patch applies cleanly, compiles, "make check" ok.


Also about testing, I'd do less rounds, 4 quite seems enough.

--
Fabien.



Re: CF app feature request

2018-12-26 Thread Magnus Hagander
On Sun, Dec 23, 2018 at 3:59 PM Alvaro Herrera 
wrote:

> On 2018-Dec-23, Magnus Hagander wrote:
>
> > On Wed, Nov 21, 2018 at 12:52 AM Michael Paquier 
> > wrote:
> >
> > > On Tue, Nov 20, 2018 at 03:30:38PM -0300, Alvaro Herrera wrote:
> > > > On 2018-Nov-20, Tom Lane wrote:
> > > > Certainly not higher than having the dropdown for entry
> author/reviewer
> > > > be sorted alphabetically ... *wink* *wink*
> > >
> > > More *wink* *wink*
> >
> > Here's a slightly late response to that winking :P
> >
> > They *are* sorted. By lastname. Should I take all this winking to mean
> that
> > people would prefer them  being sorted by firstname? :)
>
> Oh, wow. That's not at all obvious ... the fact that there's Scherbaum
> as the first entry, and Wang at the fifth position, threw me off.
>

Oh, there isn't, there's "ads Scherbaum" as his last name. I'm pretty sure
he does that just to mess with people. And for the second one, it's
Alexandra... But yes, I agree it's confusing, especially when you start
mixing in cultures that may not have firstname/lastname working in the way
that we do in the western world.


I can find entries easily enough knowing this.  But I think it would
> make more sense to order by the complete string that makes up the name.
>

Ok, I've pushed a change to do first_name sorting instead. Let's see if
that causes less or more confusion :)  But it will at least match the "full
string sorting".

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Alternative to \copy in psql modelled after \g

2018-12-26 Thread Daniel Verite
Fabien COELHO wrote:

> Is this just kind of a bug fix? Beforehand the documentation says "\g fn" 
> sends to file, but that was not happening with COPY, and now it does as it 
> says?

Yes. The doc says about \g:

  Sends the current query buffer to the server for execution. If an
  argument is given, the query's output is written to the named file
  or piped to the given shell command, instead of displaying it as
  usual.

It does not add "unless the query is a COPY", so it seems right
to make that just work, and call it a bug fix.

> A question: if opening the output file fails, should not the query
> be cancelled and an error be reported? Maybe it is too late for
> that. It seems that "SELECT pg_sleep(10) \g /bad/file" fails but
> executes the query nevertheless.

Yes. This part works as documented:

  "The file or command is written to only if the query successfully
  returns zero or more tuples, not if the query fails or is a
  non-data-returning SQL command."

> However, it does not contain any tests (bad:-)

Testing this requires at least some interaction with the OS which I'm
uncomfortable to add. There is a precedent in
regress/sql/hs_standby_allowed.sql doing:

  COPY hs1 TO '/tmp/copy_test'
  \! cat /tmp/copy_test

We could add something like that in psql.sql, but I'm not fond of it
because it assumes a Unix-ish environment, and that it's OK to clobber
the hardcoded /tmp/copy_test should it preexist. I'd rather work with
a dedicated temporary directory. On Linux mktemp -d could be used, but
I don't know about a portable solution, plus POSIX has deprecated
mktemp.
I'm open to ideas on a portable way for psql.sql to test \g writing to a
file or a program, but ATM my inclination is to not add that test.


> nor documentation (hmmm... maybe none needed, though).

\copy has this paragraph:

  "The syntax of this command is similar to that of the SQL COPY
  command. All options other than the data source/destination are as
  specified for COPY. Because of this, special parsing rules apply to
  the \copy meta-command. Unlike most other meta-commands, the entire
  remainder of the line is always taken to be the arguments of \copy,
  and neither variable interpolation nor backquote expansion are
  performed in the arguments".

We could add that COPY TO STDOUT with a redirection might be used as an
alternative. The downside is that with four paragraphs plus one tip,
the explanations on \copy give already a lot to chew on, so is it
worth to add more?


> ISTM that overriding copystream on open failures is not necessary, because 
> its value is already NULL in this case.
> 
> I'd suggest that is_pipe should be initialized to false, otherwise it is 
> unclear from the code when it is set before use, and some compilers may 
> complain.

OK, will consider these in the next revision.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: chained transactions

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-26, Fabien COELHO wrote:

> > > Copying & comparing nodes are updated. Should making, outing and reading
> > > nodes also be updated?
> > 
> > TransactionStmt isn't covered by the node serialization functions, so I
> > didn't see anything to update.  What did you have in mind?
> 
> Sigh. I had in mind that the serialization feature would work with all
> possible nodes, not just some of them… which seems quite naïve. The whole
> make/copy/cmp/in/out functions depress me, all this verbose code should be
> automatically generated from struct declarations. I'm pretty sure there are
> hidden bugs in there.

There may well be, but keep in mind that the nodes that have out and
read support are used in view declarations and such (stored rules); they
are used pretty extensively.  Nodes that cannot be part of stored rules
don't need to have read support.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera  
> wrote:
> > But then I'm not clear *why* you would like to do a non-concurrent
> > refresh.
> 
> I mostly wanted to support if for two reasons:
> 
> - completeness: maybe we cannot imagine the use case yet, but somebody
> might in the future

Understood.  We don't like features that fail to work in conjunction
with other features, so this is a good goal to keep in mind.

> - getting trigger calls for initial inserts: you can then create
> materialized view without data, attach triggers, and then run a
> regular refresh; this allows you to have only one code path to process
> any (including initial) changes to the view through notifications,
> instead of handling initial data differently

Sounds like you could do this by fixing concurrent refresh to also work
when the MV is WITH NO DATA.

> > Maybe your situation would be best served by forbidding non-
> > concurrent refresh when the MV contains any triggers.
> 
> If this would be acceptable by the community, I could do it.

I think your chances are 50%/50% that this would appear acceptable.

> > Alternatively, maybe reimplement non-concurrent refresh so that it works
> > identically to concurrent refresh (except with a stronger lock).  Not
> > sure if this implies any performance penalties.
> 
> Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> That would then generate reasonable trigger calls.

Right.

> Are there any existing benchmarks for such operations I could use to
> see if there are any performance changes if I change implementation
> here? Any guidelines how to evaluate this?

Not that I know of.  Typically the developer of a feature comes up with
appropriate performance tests also, targetting average and worst cases.

If the performance worsens with the different implementation, one idea
is to keep both and only use the slow one when triggers are present.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Alternative to \copy in psql modelled after \g

2018-12-26 Thread Fabien COELHO



Hello,


Is this just kind of a bug fix? Beforehand the documentation says "\g fn"
sends to file, but that was not happening with COPY, and now it does as it
says?


Yes. [...]

It does not add "unless the query is a COPY", so it seems right
to make that just work, and call it a bug fix.


Does this suggest backpatching?


However, it does not contain any tests (bad:-)


Testing this requires at least some interaction with the OS which I'm
uncomfortable to add.


Hmmm. This means that "\g filename" goes fully untested:-( A casual grep 
seems to confirm this. Sigh:-(



There is a precedent in regress/sql/hs_standby_allowed.sql doing:

 COPY hs1 TO '/tmp/copy_test'
 \! cat /tmp/copy_test


Indeed. I'm unsure windows has cat or /tmp, so I do not understand how it 
works on such platform. Maybe I'm missing something.



We could add something like that in psql.sql, but I'm not fond of it
because it assumes a Unix-ish environment,


Yep.


I'm open to ideas on a portable way for psql.sql to test \g writing to a
file or a program, but ATM my inclination is to not add that test.


A relative file could be ok? After some testing, the standard regression 
tests do not cd to some special place, so it may fail?


However TAP tests do that, and I have used this extensively with pgbench, 
so a psql TAP test could do that and other things, such as importing a csv 
file or whatever.



nor documentation (hmmm... maybe none needed, though).


\copy has this paragraph:

 "The syntax of this command is similar to that of the SQL COPY
 command. All options other than the data source/destination are as
 specified for COPY. Because of this, special parsing rules apply to
 the \copy meta-command. Unlike most other meta-commands, the entire
 remainder of the line is always taken to be the arguments of \copy,
 and neither variable interpolation nor backquote expansion are
 performed in the arguments".

We could add that COPY TO STDOUT with a redirection might be used as an
alternative. The downside is that with four paragraphs plus one tip,
the explanations on \copy give already a lot to chew on, so is it
worth to add more?


I'd say that a small paragraph with the tip would be ok.

--
Fabien.



Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-12-26 Thread Alexey Kondratov

Greetings,



- Reusing the GUC parser is something I would avoid as well.  Not 
worth

the complexity.
Yes, I don't like it either. I will try to make guc-file.l frontend 
safe.

Any success with that?


I looked into it and found that currently guc-file.c is built as part 
of guc.c, so it seems to be even more complicated to unbound 
guc-file.c from backend. Thus, I have some plan of how to proceed with 
patch:


1) Add guc-file.h and build guc-file.c separately from guc.c

2) Put guc-file.l / guc-file.h into common/*

3) Isolate all backend specific calls in guc-file.l with #ifdef FRONTEND

Though I am not sure that this work is worth doing against extra 
redundancy added by simply adding frontend-safe copy of guc-file.l 
lexer. If someone has any thoughts I would be glad to receive comments.




I have finally worked it out. Now there is a common version of 
guc-file.l and guc-file.c is built separately from guc.c. I had to use a 
limited number of #ifndef FRONTEND, mostly to replace erreport calls. 
Also, ProcessConfigFile and ProcessConfigFileInternal have been moved 
inside guc.c explicitly as being a backend specific. So for me this 
solution looks much more concise and neat.


Please, find the new version of patch attached. Tap tests have been 
updated as well in order to handle both command line and postgresql.conf 
specified restore_command.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 8a6c9f89f45c9568d95e05b0586d1cc54905e6de Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 src/backend/Makefile  |   4 +-
 src/backend/commands/extension.c  |   1 +
 src/backend/utils/misc/Makefile   |   8 -
 src/backend/utils/misc/guc.c  | 434 +--
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/RewindTest.pm   |  96 +++-
 src/bin/pg_rewind/parsexlog.c | 182 ++-
 src/bin/pg_rewind/pg_rewind.c |  91 +++-
 src/bin/pg_rewind/pg_rewind.h |  12 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/common/Makefile   |   7 +-
 src/{backend/utils/misc => common}/guc-file.l | 514 --
 src/include/common/guc-file.h |  50 ++
 src/include/utils/guc.h   |  39 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 src/tools/msvc/clean.bat  |   2 +-
 18 files changed, 952 insertions(+), 504 deletions(-)
 rename src/{backend/utils/misc => common}/guc-file.l (60%)
 create mode 100644 src/include/common/guc-file.h

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 25eb043941..ddbe2f3fce 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -186,7 +186,7 @@ distprep:
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	distprep
-	$(MAKE) -C utils/misc	guc-file.c
+	$(MAKE) -C common	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
 
 
@@ -307,7 +307,7 @@ maintainer-clean: distclean
 	  replication/syncrep_scanner.c \
 	  storage/lmgr/lwlocknames.c \
 	  storage/lmgr/lwlocknames.h \
-	  utils/misc/guc-file.c \
+	  common/guc-file.c \
 	  utils/sort/qsort_tuple.c
 
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 31dcfe7b11..ec0367d068 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -47,6 +47,7 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/guc-file.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index a53fcdf188..2e6a879c46 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -25,11 +25,3 @@ override CPPFLAGS += -DPG_KRB_SRVTAB='"$(krb_srvtab)"'
 endif
 
 include $(top_srcdir)/src/backend/common.mk
-
-# guc-file is compiled as part of guc
-guc.o: guc-file.c
-
-# Note: guc-file.c is not deleted by 'make clean',
-# since we want to ship it in distribution tarballs.
-clean:
-	@rm -f lex.yy.c
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..a866503186 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigger.h"
+#include "common/guc-file.h"
 #include "common/string.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -210,7 +211,6 @@ static bool check_r

Re: Change pgarch_readyXlog() to return .history files first

2018-12-26 Thread David Steele

On 12/24/18 1:31 PM, Michael Paquier wrote:

On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote:

Thanks for the lookups.  I can see that the patch applies without
conflicts down to 9.4, and based on the opinions gathered on this
thread back-patching this stuff is the consensus, based on input from
Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
any objections from others in doing so?


On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed
"only" down to 9.5.  Thanks David for the patch and Horiguchi-san for
the review.


Thanks, Michael!

I'm not too worried about 9.4 since it is currently the oldest supported 
version.  HA users tend to be on the leading edge of the upgrade curve 
and others have the opportunity to upgrade if the reordering will help them.


--
-David
da...@pgmasters.net



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 19, 2018 at 8:01 PM Andres Freund  wrote:
> The last time I looked into perfect hash functions, it wasn't easy to
> find a generator that competed with a decent normal hashtable (in
> particular gperf's are very unconvincing). The added tooling is a
> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
> search, where the latter will usually loose.  Wonder if we shouldn't
> generate a serialized non-perfect hashtable instead. The lookup code for
> a read-only hashtable without concern for adversarial input is pretty
> trivial.

I wonder if we could do something really simple like a lookup based on
the first character of the scan keyword. It looks to me like there are
440 keywords right now, and the most common starting letter is 'c',
which is the first letter of 51 keywords. So dispatching based on the
first letter clips at least 3 steps off the binary search.  I don't
know whether that's enough to be worthwhile, but it's probably pretty
simple to implement.

I'm not sure that I understand quite what you have in mind for a
serialized non-perfect hashtable.  Are you thinking that we'd just
construct a simplehash and serialize it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 19, 2018 at 8:01 PM Andres Freund  wrote:
>> The last time I looked into perfect hash functions, it wasn't easy to
>> find a generator that competed with a decent normal hashtable (in
>> particular gperf's are very unconvincing). The added tooling is a
>> concern imo.  OTOH, we're comparing not with a hashtable, but a binary
>> search, where the latter will usually loose.  Wonder if we shouldn't
>> generate a serialized non-perfect hashtable instead. The lookup code for
>> a read-only hashtable without concern for adversarial input is pretty
>> trivial.

> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dispatching based on the
> first letter clips at least 3 steps off the binary search.  I don't
> know whether that's enough to be worthwhile, but it's probably pretty
> simple to implement.

I think there's a lot of goalpost-moving going on here.  The original
idea was to trim the physical size of the data structure, as stated
in the thread subject, and just reap whatever cache benefits we got
along the way from that.  I am dubious that we actually have any
performance problem in this code that needs a big dollop of added
complexity to fix.

In my hands, the only part of the low-level parsing code that commonly
shows up as interesting in profiles is the Bison engine.  That's probably
because the grammar tables are circa half a megabyte and blow out cache
pretty badly :-(.  I don't know of any way to make that better,
unfortunately.  I suspect that it's just going to get worse, because
people keep submitting additions to the grammar.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread David Fetter
On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
> 
> In my hands, the only part of the low-level parsing code that
> commonly shows up as interesting in profiles is the Bison engine.

Should we be considering others? As I understand it, steps have been
made in this field since yacc was originally designed. Is LALR
actually suitable for languages like SQL, or is it just there for
historical reasons?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-26 Thread Tom Lane
Thomas Munro  writes:
> Since it's not fixing a bug, we wouldn't back-patch that into existing
> releases.  But I agree that we should do something like this for
> PostgreSQL 12, and I think we should make it user configurable.

I'm -1 on making this user configurable via a GUC; that adds documentation
and compatibility burdens that we don't need, for something of no value
to 99.99% of users.  The fact that the default would need to be
platform-dependent just makes that tradeoff even worse.  I think the other
0.01% who need to change the default (and are bright enough to be doing
the right thing for the right reasons) could certainly handle something
like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
for a precedent that I think applies well here.  So I'd favor just doing
it that way.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 26, 2018 at 11:22 AM Tom Lane  wrote:
> I think there's a lot of goalpost-moving going on here.  The original
> idea was to trim the physical size of the data structure, as stated
> in the thread subject, and just reap whatever cache benefits we got
> along the way from that.  I am dubious that we actually have any
> performance problem in this code that needs a big dollop of added
> complexity to fix.

I have seen ScanKeywordLookup show up in profiles quite often and
fairly prominently -- like several percent of total runtime. I'm not
trying to impose requirements on John's patch, and I agree that
reducing the physical size of the structure is a good step whether
anything else is done or not. However, I don't see that as a reason to
shut down further discussion of other possible improvements.  If his
patch makes this disappear from profiles, cool, but if it doesn't,
then sooner or later somebody's going to want to do more.

FWIW, my bet is this helps but isn't enough to get rid of the problem
completely.  A 9-step binary search has got to be slower than a really
well-optimized hash table lookup. In a perfect world the latter
touches the cache line containing the keyword -- which presumably is
already in cache since we just scanned it -- then computes a hash
value without touching any other cache lines -- and then goes straight
to the right entry.  So it touches ONE new cache line.  That might a
level of optimization that's hard to achieve in practice, but I don't
think it's crazy to want to get there.

> In my hands, the only part of the low-level parsing code that commonly
> shows up as interesting in profiles is the Bison engine.  That's probably
> because the grammar tables are circa half a megabyte and blow out cache
> pretty badly :-(.  I don't know of any way to make that better,
> unfortunately.  I suspect that it's just going to get worse, because
> people keep submitting additions to the grammar.

I'm kinda surprised that you haven't seen ScanKeywordLookup() in
there, but I agree with you that the size of the main parser tables is
a real issue, and that there's no easy solution. At various times
there has been discussion of using some other parser generator, and
I've also toyed with the idea of writing one specifically for
PostgreSQL. Unfortunately, it seems like bison is all but
unmaintained; the alternatives are immature and have limited adoption
and limited community; and writing something from scratch is a ton of
work.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Feature: temporary materialized views

2018-12-26 Thread Alvaro Herrera
On 2018-Dec-25, Mitar wrote:

> Sometimes materialized views are used to cache a complex query on
> which a client works. But after client disconnects, the materialized
> view could be deleted. Regular VIEWs and TABLEs both have support for
> temporary versions which get automatically dropped at the end of the
> session. It seems it is easy to add the same thing for materialized
> views as well. See attached PoC patch.

I think MVs that are dropped at session end are a sensible feature.  I
probably wouldn't go as far as allowing ON COMMIT actions, though, so
this much effort is the right amount.

I think if you really want to do this you should just use OptTemp, and
delete OptNoLog.  Of course, you need to add tests and patch the docs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-26 Thread Mitar
Hi!

On Wed, Dec 26, 2018 at 4:38 AM Alvaro Herrera  wrote:
> Sounds like you could do this by fixing concurrent refresh to also work
> when the MV is WITH NO DATA.

Yes, I do not think this would be too hard to fix. I could do this nevertheless.

> > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> > That would then generate reasonable trigger calls.
>
> Right.

I have tested this yesterday and performance is 2x worse that heap
swap on the benchmark I made. So I do not think this is a viable
approach.

I am now looking into simply triggering TRUNCATE and INSERT triggers
after heap swap simulating the above. I made AFTER STATEMENT triggers
and it looks like it is working, only NEW table is not populated for
some reason. Any suggestions? See attached patch.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index e583989e1d..b79b9be687 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -882,8 +882,54 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
+	Relation	matviewRel;
+	EState		*estate;
+	ResultRelInfo resultRelInfo;
+	TransitionCaptureState *transition_capture;
+
 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
 	 RecentXmin, ReadNextMultiXactId(), relpersistence);
+
+	matviewRel = heap_open(matviewOid, NoLock);
+
+	/* Prepare to catch AFTER triggers. */
+	AfterTriggerBeginQuery();
+
+	/*
+	 * To fire triggers, we'll need an EState as well as a ResultRelInfo.
+	 */
+	estate = CreateExecutorState();
+
+	InitResultRelInfo(&resultRelInfo,
+	  matviewRel,
+	  0,	/* dummy rangetable index */
+	  NULL,
+	  0);
+	estate->es_result_relations = &resultRelInfo;
+	estate->es_num_result_relations = 1;
+	estate->es_result_relation_info = &resultRelInfo;
+
+	/* Execute AFTER STATEMENT truncate triggers */
+	ExecASTruncateTriggers(estate, &resultRelInfo);
+
+	transition_capture =
+		MakeTransitionCaptureState(matviewRel->trigdesc,
+   RelationGetRelid(matviewRel),
+   CMD_INSERT);
+
+	/* Execute AFTER STATEMENT insertion triggers */
+	ExecASInsertTriggers(estate, &resultRelInfo, transition_capture);
+
+	/* Handle queued AFTER triggers */
+	AfterTriggerEndQuery(estate);
+
+	/* Close any trigger target relations */
+	ExecCleanUpTriggerState(estate);
+
+	/* We can clean up the EState now */
+	FreeExecutorState(estate);
+
+	heap_close(matviewRel, NoLock);
 }
 
 /*


Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
David Fetter  writes:
> On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
>> In my hands, the only part of the low-level parsing code that
>> commonly shows up as interesting in profiles is the Bison engine.

> Should we be considering others?

We've looked around before, IIRC, and not really seen any arguably
better tools.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas  writes:
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
> I've also toyed with the idea of writing one specifically for
> PostgreSQL. Unfortunately, it seems like bison is all but
> unmaintained; the alternatives are immature and have limited adoption
> and limited community; and writing something from scratch is a ton of
> work.  :-(

Yeah, and also: SQL is a damn big and messy language, and so it's not
very clear that it's really bison's fault that it's slow to parse.
We might do a ton of work to implement an alternative, and then find
ourselves no better off.

regards, tom lane



Re: Feature: temporary materialized views

2018-12-26 Thread Mitar
Hi!

On Wed, Dec 26, 2018 at 9:00 AM Alvaro Herrera  wrote:
> I think MVs that are dropped at session end are a sensible feature.

Thanks.

> I probably wouldn't go as far as allowing ON COMMIT actions, though

I agree. I do not see much usefulness for it. The only use case I can
think of would be to support REFRESH as an ON COMMIT action. That
would be maybe useful in the MV setting. After every transaction in my
session, REFRESH this materialized view.

But personally I do not have an use case for that, so I will leave it
to somebody else. :-)

> I think if you really want to do this you should just use OptTemp, and
> delete OptNoLog.

Sounds good.

OptTemp seems to have a misleading warning in some cases when it is
not used on tables though:

"GLOBAL is deprecated in temporary table creation"

Should we change this language to something else? "GLOBAL is
deprecated in temporary object creation"? Based on grammar it seems to
be used for tables, views, sequences, and soon materialized views.

> Of course, you need to add tests and patch the docs.

Sure.

[1] https://www.postgresql.org/message-id/29165.1545842105%40sss.pgh.pa.us


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Feature: temporary materialized views

2018-12-26 Thread Pavel Stehule
st 26. 12. 2018 v 18:20 odesílatel Mitar  napsal:

> Hi!
>
> On Wed, Dec 26, 2018 at 9:00 AM Alvaro Herrera 
> wrote:
> > I think MVs that are dropped at session end are a sensible feature.
>
> Thanks.
>
> > I probably wouldn't go as far as allowing ON COMMIT actions, though
>
> I agree. I do not see much usefulness for it. The only use case I can
> think of would be to support REFRESH as an ON COMMIT action. That
> would be maybe useful in the MV setting. After every transaction in my
> session, REFRESH this materialized view.
>
> But personally I do not have an use case for that, so I will leave it
> to somebody else. :-)
>
> > I think if you really want to do this you should just use OptTemp, and
> > delete OptNoLog.
>
> Sounds good.
>
> OptTemp seems to have a misleading warning in some cases when it is
> not used on tables though:
>
> "GLOBAL is deprecated in temporary table creation"
>
> Should we change this language to something else? "GLOBAL is
> deprecated in temporary object creation"? Based on grammar it seems to
> be used for tables, views, sequences, and soon materialized views.
>

This message is wrong - probably better "GLOBAL temporary tables are not
supported"

Regards

Pavel

>
> > Of course, you need to add tests and patch the docs.
>
> Sure.
>
> [1] https://www.postgresql.org/message-id/29165.1545842105%40sss.pgh.pa.us
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
>


  1   2   >