Re: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-12-06 Thread Noah Misch
On Wed, Nov 21, 2018 at 10:05:34AM -0500, Robert Haas wrote:
> On Tue, Nov 20, 2018 at 1:21 AM Takahashi, Ryohei 
>  wrote:
> > My customer uses PostgreSQL on Windows and hits the problem that following 
> > log is written to the server logs too frequently (250 thousand times per 
> > day).
> > "LOG:  could not reserve shared memory region (addr=%p) for child %p:"

> > According to the comment of internal_forkexec(), 
> > pgwin32_ReserveSharedMemoryRegion() sometimes fails if ASLR is active. If 
> > so, I think administrators are not interested in this log since it is a 
> > normal event.

Windows ASLR can't explain your system's behavior.

> You might be right, but maybe we should first try to understand why
> this is happening so frequently.  Maybe it's not that normal.

When PostgreSQL installed that workaround, we weren't aware of any system
where this message was common enough to make the log volume a problem.  I
agree the message is harmful on your system, but I'm not inclined to change it
to DEBUG1 for the benefit of one system.  Can you adopt pgbouncer, to reduce
the frequency of starting new backend processes?  That should improve your
performance, too.

Could you collect the information
http://postgr.es/m/20181203053506.gb2860...@rfd.leadboat.com requests?  That
may help us understand your system's unusual behavior.  (The issue in that
thread is related but not identical.)



Re: Should new partitions inherit their tablespace from their parent?

2018-12-06 Thread Michael Paquier
On Sat, Nov 24, 2018 at 10:18:17AM +0900, Michael Paquier wrote:
> On Sat, Nov 24, 2018 at 12:32:36PM +1300, David Rowley wrote:
>> On Fri, 23 Nov 2018 at 17:03, David Rowley  
>> wrote:
>> > Here's a patch for that.  Parking here until January's commitfest.
>> 
>> And another, now rebased atop of de38ce1b8 (which I probably should
>> have waited for).

Thanks for the patch, David.  I can see that this patch makes the code
more consistent for partitioned tables and partitioned indexes when it
comes to tablespace handling, which is a very good thing.

-ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace)
+ATExecSetTableSpaceNoStorage(Relation rel, Oid newTableSpace)
NoStorage looks strange as routine name for this case.  Would something
like ATExecPartedRelSetTableSpace be more adapted perhaps?

+   else if (stmt->partbound)
+   {
+   RangeVar   *parent;
+   Relationparentrel;
+
+   /*
+* For partitions, when no other tablespace is specified, we default
+* the tablespace to the parent partitioned table's.
+*/
Okay, so the direct parent is what counts, and not the top-most parent.
Could you add a test with multiple level of partitioned tables, like:
- parent is in tablespace 1.
- child is created in tablespace 2.
- grandchild is created, which should inherit tablespace 2 from the
child, but not tablespace 1 from the parent.  In the existing example,
as one partition is used to test the top-most parent and another for the
new default, it looks cleaner to create a third partition which would be
itself a partitioned table.
--
Michael


signature.asc
Description: PGP signature


Re: Support custom socket directory in pg_upgrade

2018-12-06 Thread Noah Misch
On Sat, Nov 17, 2018 at 10:15:08PM +0100, Daniel Gustafsson wrote:
> > On 15 Nov 2018, at 22:42, Tom Lane  wrote:
> 
> > Further point about that: pg_regress's method of creating a temp
> > directory under /tmp is secure only on machines with the stickybit
> > set on /tmp; otherwise it's possible for an attacker to rename the
> > temp dir out of the way and inject his own socket.  We agreed that
> > that was an okay risk to take for testing purposes, but I'm much
> > less willing to assume that it's okay for production use with
> > pg_upgrade.
> 
> That’s a good point, it’s not an assumption I’d be comfortable with when it
> deals with system upgrades.

As in https://postgr.es/m/flat/20140329222934.gc170...@tornado.leadboat.com, I
maintain that insecure /tmp is not worth worrying about in any part of
PostgreSQL.



RE: [Todo item] Add entry creation timestamp column to pg_stat_replication

2018-12-06 Thread myungkyu.lim
>I have let this stuff cool down for a couple of days, still it seems to me
>that having one single column makes the most sense when it comes when
>looking at a mostly idle system.  I'll try to finish this one at the
>beginning of next week, for now I am marking as ready for committer.

Thank you for your attention!

Best regards,
Myungkyu, Lim




Re: [Todo item] Add entry creation timestamp column to pg_stat_replication

2018-12-06 Thread Michael Paquier
On Tue, Dec 04, 2018 at 04:24:25PM +0900, myungkyu.lim wrote:
> I think purpose of this field, 
> that react interval check and debugging on idle system.
> So, merging both is better.

I have let this stuff cool down for a couple of days, still it seems to
me that having one single column makes the most sense when it comes when
looking at a mostly idle system.  I'll try to finish this one at the
beginning of next week, for now I am marking as ready for committer.
--
Michael


signature.asc
Description: PGP signature


Re: Strange OSX make check-world failure

2018-12-06 Thread Tom Lane
Samuel Cochran  writes:
> System Integrity Protection strips dynamic linker (dyld) environment 
> variables, such as DYLD_LIBRARY_PATH, during exec(2) [1]

Yeah.  I wish Apple would just fix that silliness ... I'll spare you the
rant about why it's stupid, but it is.  (BTW, last I looked, it's not
exec(2) per se that's doing the damage; the problem is that we're
invoking a sub-shell that's considered a protected program for some
reason, and it's only the use of that that causes DYLD_LIBRARY_PATH
to get removed from the process environment.)

> so we need to rewrite the load paths inside binaries when relocating then 
> during make temp-install before make check on darwin.

Interesting proposal, but I think it needs work.

* As coded, this only fixes the problem for references to libpq, not
any of our other shared libraries.

* It's also unpleasant that it hard-wires knowledge of libpq's version
numbering in a place pretty far removed from anywhere that should know
that.

* Just to be annoying, this won't work at all on 32-bit OSX versions
unless we link everything with -headerpad_max_install_names.  (I know
Apple forgot about 32-bit machines long ago, but our buildfarm hasn't.)

* Speaking of not working, I don't think this "find" invocation will
report any failure exits from install_name_tool.

* This doesn't fix anything for executables that never get installed,
for instance isolationtester.

We could probably fix the first four problems with some more sweat,
but I'm not seeing a plausible answer to the last one.  Overwriting
isolationtester's rpath to make "make check" work would just break
it for "make installcheck".

regards, tom lane



Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-06 Thread Ashutosh Bapat
On Thu, Dec 6, 2018 at 1:27 PM Alvaro Herrera 
wrote:

> On 2018-Dec-06, Amit Langote wrote:
>
> > The partitionwise join related
> > changes in PG 11 moved the add_child_rel_equivalences call in
> > set_append_rel_size such that child EC members would be added even before
> > checking if the child rel is dummy, but for a reason named in the comment
> > above the call:
> >
> >... Even if this child is
> >  * deemed dummy, it may fall on nullable side in a child-join, which
> >  * in turn may participate in a MergeAppend, where we will need the
> >  * EquivalenceClass data structures.
> >
> > However, I think we can skip adding the dummy child EC members here  and
> > instead make it a responsibility of partitionwise join code in joinrels.c
> > to add the needed EC members.  Attached a patch to show what I mean,
> which
> > passes the tests and gives this planning time:
>
> Robert, Ashutosh, any comments on this?  I'm unfamiliar with the
> partitionwise join code.
>

As the comment says it has to do with the equivalence classes being used
during merge append. EC's are used to create pathkeys used for sorting.
Creating a sort node which has column on the nullable side of an OUTER join
will fail if it doesn't find corresponding equivalence class. You may not
notice this if both the partitions being joined are pruned for some reason.
Amit's idea to make partition-wise join code do this may work, but will add
a similar overhead esp. in N-way partition-wise join once those equivalence
classes are added.

--
Best Wishes,
Ashutosh Bapat


RE: libpq debug log

2018-12-06 Thread Iwata, Aya
Hi,

> TODO:
> I will add the feature called "logging level" on next version patch. I will
> attach it as soon as possible.
I created v4 patch. Please find attached the patch.
This patch developed "logminlevel" parameter. level1 and level2 can be set, 
level1 is the default. 
If level1 is set, it outputs the time in the functions. It helps to find out 
where it takes time.
If level2 is set, it outputs information on the protocol being exchanged in 
addition to level1 information.

I would appreciate if you could review my latest patch.

Regards,
Aya Iwata


v4-0001-libpq-trace-log.patch
Description: v4-0001-libpq-trace-log.patch


Re: Strange OSX make check-world failure

2018-12-06 Thread Samuel Cochran
Hi folks 

Forgive me if I'm getting the mailing list etiquette wrong — first time poster.

I ended up sitting next to Thomas Munro at PGDU 2018 and talking about testing. 
While trying to get `make check` running on my macbook, I think I may have 
fixed this issue.

System Integrity Protection strips dynamic linker (dyld) environment variables, 
such as DYLD_LIBRARY_PATH, during exec(2) [1] so we need to rewrite the load 
paths inside binaries when relocating then during make temp-install before make 
check on darwin. Homebrew does something similar [2]. I've attached a patch 
which adjust the Makefile and gets make check working on my machine with SIP in 
tact.

Cheers,
Sam

  [1]: 
https://developer.apple.com/library/archive/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html
  [2]: 
https://github.com/Homebrew/brew/blob/77e6a927504c51a1393a0a6ccaf6f2611ac4a9d5/Library/Homebrew/os/mac/keg.rb#L17-L30


On Tue, Sep 18, 2018, at 8:39 AM, Tom Lane wrote:
> Thomas Munro  writes:
> > On Tue, Sep 18, 2018 at 2:14 AM Tom Lane  wrote:
> >> "make check" generally won't work on OSX unless you've disabled SIP:
> >> https://www.howtogeek.com/230424/how-to-disable-system-integrity-protection-on-a-mac-and-why-you-shouldnt/
> 
> > Aha!  It looks like it was important to run "make install" before
> > running those tests.
> 
> Right.  If you don't want to disable SIP, you can work around it by always
> doing "make install" before "make check".  Kind of a PITA though.
> 
>   regards, tom lane
> 
> 
From cdfe53a93453d8cdf12cfaaea13574365fbba66b Mon Sep 17 00:00:00 2001
From: Samuel Cochran 
Date: Fri, 7 Dec 2018 15:27:30 +1100
Subject: [PATCH] Fix `make check` on darwin

System Integrity Protection strips dynamic linker (dyld) environment
variables, such as DYLD_LIBRARY_PATH, during exec(2), so we need to
rewrite the load paths inside binaries when relocating then during make
temp-install before make check on darwin.

https://developer.apple.com/library/archive/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 956fd274cd..48f2e2bcc7 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -390,6 +390,10 @@ ifeq ($(MAKELEVEL),0)
rm -rf '$(abs_top_builddir)'/tmp_install
$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+   # darwin doesn't propagate DYLD_* vars due to system integrity
+   # protection, so we need to rewrite the load commands inside the
+   # binaries when relocating them
+   $(if $(filter $(PORTNAME),darwin),find 
'$(abs_top_builddir)/tmp_install$(bindir)' -type f -exec install_name_tool 
-change '$(libdir)/libpq.5.dylib' 
'$(abs_top_builddir)/tmp_install$(libdir)/libpq.5.dylib' {} \;)
 endif
$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C 
'$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install 
>>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
 endif


RE: Timeout parameters

2018-12-06 Thread Nagaura, Ryohei
Hi,

There was an invisible space, so I removed it.
I registered with 2019-01 commitfest.

Best regards,
-
Ryohei Nagaura

> -Original Message-
> From: Nagaura, Ryohei [mailto:nagaura.ryo...@jp.fujitsu.com]
> Sent: Thursday, December 6, 2018 2:20 PM
> To: 'Fabien COELHO' ;
> 'pgsql-hack...@postgresql.org' 
> Cc: Yahorau, A. (IBA) 
> Subject: RE: Timeout parameters
> 
> Hi, Fabien.
> 
> Thank you for your review.
> And I'm very sorry to have kept you waiting so long.
> 
> 
> About "socket_timeout"
> 
> > I'm generally fine with giving more access to low-level parameters to
> users.
> > However, I'm not sure I understand the use case you have that needs
> > these new extensions.
> If you face the following situation, this parameter will be needed.
> 1. The connection between the server and the client has been established
> normally.
> 2. A server process has been received SQL statement.
> 3. The server OS can return an ack packet, but it takes time to execute
> the SQL statement
>Or return the result because the server process is very busy.
> 4. The client wants to close the connection while leaving the job to the
> server.
> In this case, "statement_timeout" can't satisfy at line 4.
> 
> > I think that there is some kind of a misnomer: this is not a
> > socket-level timeout, but a client-side query timeout, so it should be
> named differently?
> Yes, I think so.
> 
> > I'm not sure how to name it, though.
> Me too.
> 
> > I think that the way it works is a little extreme, basically the
> > connection is aborted from within pqWait, and then restarted from scratch.
> >
> > There is no clear way to know about the value of the setting (SHOW,
> > \set, \pset...). Ok, this is already the case of other connection
> parameters.
> If this parameter can be needed, I would like to discuss design and optional
> functions.
> How do you think?
> I'll correct patch of "socket_timeout" after that.
> 
> 
> About "TCP_USER_TIMEOUT"
> I fixed on the previous feedback.
> Would you review, please?
> 
> > There are no tests.
> I introduce the test methods of TCP_USER_TIMEOUT.
> 
> Test of client-side TCP_USER_TIMEOUT:
> [client operation]
> 1. Connect DB server.
>   postgres=# psql
> postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=1
> 5000
> 2. Get the port number by the following command:
>   postgres=# select inet_client_port();
> 3. Close the client port from the other console of the client machine.
>Please rewrite "56750" to the number confirmed on line 2.
>   $ iptables -I INPUT -p tcp --dport 56750 -j DROP 4. Query the
> following SQL:
>   postgres=# select pg_sleep(10);
> 5. TCP USER TIMEOUT works correctly if an error message is output to the
> console.
> 
> Test of server-side TCP_USER_TIMEOUT:
> [client operation]
> 1. Connect DB server.
> 2. Get the port number by the following command:
>   postgres=# select inet_client_port();
> 3. Set the TCP_USER_TIMEOUT by the following command:
>   postgres=# set tcp_user_timeout=15000;
> 4. Query the following SQL:
>   postgres=# select pg_sleep(10);
> 5. Close the client port from the other console.
>Please rewrite "56750" to the number confirmed on line 2.
>   $ iptables -I INPUT -p tcp --dport 56750 -j DROP [server operation]
> 6. Verify the logfile.
> 
> > There is no documentation.
> I made a patch of documentation of TCP USER TIMEOUT.
> 
> Best regards,
> -
> Ryohei Nagaura



document_v2.patch
Description: document_v2.patch


TCP_backend_v2.patch
Description: TCP_backend_v2.patch


TCP_interface_v2.patch
Description: TCP_interface_v2.patch


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2018-12-06 Thread Andrey Lepikhov




On 06.12.2018 11:52, Peter Geoghegan wrote:

On Wed, Dec 5, 2018 at 10:35 PM Andrey Lepikhov
 wrote:

This solution changes pg_depend relation for solve a problem, which
exists only in regression tests.  Very rarely it can be in the
partitioning cases. Or is it not?


I don't think it's a matter of how rarely this will happen. We're
trying to avoid these diagnostic message changes because they are
wrong. I don't think that there is much ambiguity about that.  Still,
it will happen however often the user drops objects belonging to
partition children, which could be quite often.


I want to say that we need to localize the rules for the order of the 
diagnostic messages as much as possible in dependency.c.



I think this decision is some excessive.
May be you consider another approach:
1. Order of dependencies in 'DROP ... CASCADE' case is a problem of test
tools, not DBMS. And here we can use 'verbose terse'.
2. Print all dependencies in findDependentObjects() on a drop error (see
attachment as a prototype).


You didn't include changes to the regression test output, which seems
like a big oversight...


It was done knowingly to show the differences in messages that 
introduces this approach.



messages that are represented in the regression tests. Anyway, I don't
think it's acceptable to change the messages like this. It makes them
much less useful.


May you clarify this? If I understand correctly, your solution is that 
user receives a report about the last inserted internal dependency on 
the object. Why the full info about internal dependencies of the object 
much less useful?


These stability issue keeps coming up, which makes a comprehensive
approach seem attractive to me. At least 95% of the test instability
comes from pg_depend.



During the retail index tuple deletion project (heap cleaner subsystem) 
we have non-fixed order of tuples at database relations. This caused to 
unsteady order of text strings in some check-world test results.
Thus, I realized that the order of messages in the test results is 
mostly a game of chance. For this reason I think it is necessary to find 
more general solution of the messages ordering problem.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: WIP: Avoid creation of the free space map for small tables

2018-12-06 Thread Amit Kapila
On Thu, Dec 6, 2018 at 10:53 PM John Naylor  wrote:
>
> I've added an additional regression test for finding the right block
> and removed a test I thought was redundant. I've kept the test file in
> its own schedule.
>

+# --
+# fsm does a vacuum, and running it in parallel seems to prevent heap
truncation.
+# --
+test: fsm
+

It is not clear to me from the comment why running it in parallel
prevents heap truncation, can you explain what behavior are you seeing
and what makes you think that running it in parallel caused it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Amit Kapila
On Thu, Dec 6, 2018 at 9:32 PM Robert Haas  wrote:
>
> On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule  wrote:
> > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas  
> > napsal:
> >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule  
> >> wrote:
> >> > I have a problem to imagine it. When fill factor will be low, then there 
> >> > is high risk of high fragmentation - or there some body should to do 
> >> > defragmentation.
> >>
> >> I don't understand this.
> >
> > I don't know if zheap has or has not any tools for elimination 
> > fragmentation of space of page. But I expect so after some set of updates, 
> > when record size is mutable, the free space on page should be fragmented. 
> > Usually, when you have less memory, then fragmentation is faster.
>
> Still not sure I completely understand, but it's true that zheap
> sometimes needs to compact free space on a page.  For example, if
> you've got a page with a 100-byte hole, and somebody updates a tuple
> to make it 2 bytes bigger, you've got to shift that tuple and any that
> precede it backwards to reduce the size of the hole to 98 bytes, so
> that you can fit the new version of the tuple.  If, later, somebody
> shrinks that tuple back to the original size, you've now got 100 bytes
> of free space on the page, but they are fragmented: 98 bytes in the
> "hole," and 2 bytes following the newly-shrunk tuple.  If someone
> tries to insert a 100-byte tuple in that page, we'll need to
> reorganize the page a second time to bring all that free space back
> together in a single chunk.
>
> In my view, and I'm not sure if this is how the code currently works,
> we should have just one routine to do a zheap page reorganization
> which can cope with all possible scenarios.  I imagine that you would
> give it the page is it currently exists plus a "minimum tuple size"
> for one or more tuples on the page (which must not be smaller than the
> current size of that tuple, but could be bigger).  It then reorganizes
> the page so that every tuple for which a minimum size was given
> consumes exactly that amount of space, every other tuple consumes the
> minimum possible amount of space, and the remaining space goes into
> the hole.  So if you call this function with no minimal tuple sizes,
> it does a straight defragmentation; if you give it minimum tuple
> sizes, then it rearranges the page to make it suitable for a pending
> in-place update of those tuples.
>

Yeah, the code is also along these lines, however, as of now, the API
takes input for one tuple (it's offset number and delta space
(additional space required by update that updates tuple to a bigger
size)).  As of now, we don't have a requirement for multiple tuples,
but if there is a case, I think the API can be adapted.  One more
thing we do during repair-fragmentation is to arrange tuples in their
offset order so that future sequence scans can be faster.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Add pg_partition_root to get top-most parent of a partition tree

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 10:48:59PM -0300, Alvaro Herrera wrote:
> I think adding a pg_partition_root() call to as many pg_partition_tree
> tests as you modified is overkill ... OTOH I'd have one test that
> invokes pg_partition_tree(pg_partition_root(some-partition)) to verify
> that starting from any point in the tree you get the whole tree.

Good idea, thanks for the input.

> I haven't actually tried to write a query that obtains a tree of
> constraints using this, mind ...

Sure.  It would be good to agree on an interface.  I have not tried
either, but you should be able to get away with a join on relid returned
by pg_partition_tree() with pg_constraint.conrelid with
pg_get_constraintdef() instead of a WITH RECURSIVE, no?
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_partition_root to get top-most parent of a partition tree

2018-12-06 Thread Alvaro Herrera
I think adding a pg_partition_root() call to as many pg_partition_tree
tests as you modified is overkill ... OTOH I'd have one test that
invokes pg_partition_tree(pg_partition_root(some-partition)) to verify
that starting from any point in the tree you get the whole tree.

I haven't actually tried to write a query that obtains a tree of
constraints using this, mind ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Add pg_partition_root to get top-most parent of a partition tree

2018-12-06 Thread Michael Paquier
Hi all,

Álvaro has given faced a use case where it would be useful to have a
function which is able to return the top-most parent of a partition
tree:
https://postgr.es/m/20181204184159.eue3wlchqrkh4vsc@alvherre.pgsql

This has been mentioned as well on the thread where was discussed
pg_partition_tree, but it got shaved from the committed patch as many
things happened when discussing the thing.

Attached is a patch to do the work, which includes documentation and
tests.  An argument could be made to include the top-most parent as part
of pg_partition_tree, but it feels more natural to me to have a separate
function.  This makes sure to handle invalid relations by returning
NULL, and it generates an error for incorrect relkind.

I have included as well a fix for the recent crash on pg_partition_tree
I have reported, but let's discuss the crash on its thread here:
https://www.postgresql.org/message-id/20181207010406.go2...@paquier.xyz
The bug fix would most likely get committed first, and I'll rebase this
patch as need be.

I am adding this patch to the CF of January.  I think that Amit should
also be marked as a co-author of this patch, as that's inspired from
what has been submitted previously, still I have no reused the code.

Thanks,
--
Michael
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b3336ea9be..dbec132188 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20270,6 +20270,17 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 their partitions, and so on.

   
+  
+   
+pg_partition_root
+pg_partition_root(regclass)
+   
+   regclass
+   
+Return the top-most parent of a partition tree for the given
+partitioned table or partitioned index.
+   
+  
  
 

diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c
index 78dd2b542b..611a33d0e1 100644
--- a/src/backend/utils/adt/partitionfuncs.c
+++ b/src/backend/utils/adt/partitionfuncs.c
@@ -23,7 +23,38 @@
 #include "funcapi.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 
+/*
+ * Perform several checks on a relation on which is extracted some
+ * information related to its partition tree.  Returns false if the
+ * relation cannot be processed, in which case it is up to the caller
+ * to decide what to do instead of an error.
+ */
+static bool
+check_rel_for_partition_info(Oid relid)
+{
+	char		relkind;
+
+	/* Check if relation exists */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+		return false;
+
+	relkind = get_rel_relkind(relid);
+
+	/* Only allow relation types that can appear in partition trees. */
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_FOREIGN_TABLE &&
+		relkind != RELKIND_INDEX &&
+		relkind != RELKIND_PARTITIONED_TABLE &&
+		relkind != RELKIND_PARTITIONED_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, a foreign table, or an index",
+		get_rel_name(relid;
+
+	return true;
+}
 
 /*
  * pg_partition_tree
@@ -38,20 +69,11 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 {
 #define PG_PARTITION_TREE_COLS	4
 	Oid			rootrelid = PG_GETARG_OID(0);
-	char		relkind = get_rel_relkind(rootrelid);
 	FuncCallContext *funcctx;
 	ListCell  **next;
 
-	/* Only allow relation types that can appear in partition trees. */
-	if (relkind != RELKIND_RELATION &&
-		relkind != RELKIND_FOREIGN_TABLE &&
-		relkind != RELKIND_INDEX &&
-		relkind != RELKIND_PARTITIONED_TABLE &&
-		relkind != RELKIND_PARTITIONED_INDEX)
-		ereport(ERROR,
-(errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("\"%s\" is not a table, a foreign table, or an index",
-		get_rel_name(rootrelid;
+	if (!check_rel_for_partition_info(rootrelid))
+		PG_RETURN_NULL();
 
 	/* stuff done only on the first call of the function */
 	if (SRF_IS_FIRSTCALL())
@@ -152,3 +174,39 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 	/* done when there are no more elements left */
 	SRF_RETURN_DONE(funcctx);
 }
+
+/*
+ * pg_partition_root
+ *
+ * For the given relation part of a partition tree, return its top-most
+ * root parent.
+ */
+Datum
+pg_partition_root(PG_FUNCTION_ARGS)
+{
+	Oid		relid = PG_GETARG_OID(0);
+	Oid		rootrelid;
+	List   *ancestors;
+
+	if (!check_rel_for_partition_info(relid))
+		PG_RETURN_NULL();
+
+	/*
+	 * If the relation is not a partition, return itself as a result.
+	 */
+	if (!get_rel_relispartition(relid))
+		PG_RETURN_OID(relid);
+
+	/* Fetch the top-most parent */
+	ancestors = get_partition_ancestors(relid);
+	rootrelid = llast_oid(ancestors);
+	list_free(ancestors);
+
+	/*
+	 * If the relation is actually a partition, 'rootrelid' has been set to
+	 * the OID of the root table in the partition tree.  It should be a valid
+	 * valid per the previous check for partition leaf above.
+	 */
+	Assert(OidIsValid(rootrelid));
+	PG_RETURN_OID(rootrelid);
+}
diff --git 

pg_partition_tree crashes for a non-defined relation

2018-12-06 Thread Michael Paquier
Hi all,

While testing another patch, I have bumped into the issue of
$subject...  I should have put some more negative testing from the start
on this stuff, here is a culprit query when passing directly an OID:
select pg_partition_tree(0);

I think that we should make the function return NULL if the relation
defined does not exist, as we usually do for system-facing functions.
It is also easier for the caller to know that the relation does not
exist instead of having a plpgsql try/catch wrapper or such.

Thoughts?
--
Michael
diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c
index 78dd2b542b..8748869511 100644
--- a/src/backend/utils/adt/partitionfuncs.c
+++ b/src/backend/utils/adt/partitionfuncs.c
@@ -23,6 +23,7 @@
 #include "funcapi.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 
 
 /*
@@ -42,6 +43,9 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 	FuncCallContext *funcctx;
 	ListCell  **next;
 
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(rootrelid)))
+		PG_RETURN_NULL();
+
 	/* Only allow relation types that can appear in partition trees. */
 	if (relkind != RELKIND_RELATION &&
 		relkind != RELKIND_FOREIGN_TABLE &&
diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out
index 6b116125e6..2df48fcc1c 100644
--- a/src/test/regress/expected/partition_info.out
+++ b/src/test/regress/expected/partition_info.out
@@ -6,6 +6,12 @@ SELECT * FROM pg_partition_tree(NULL);
 ---+-++---
 (0 rows)
 
+SELECT * FROM pg_partition_tree(0);
+ relid | parentrelid | isleaf | level 
+---+-++---
+   | ||  
+(1 row)
+
 -- Test table partition trees
 CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a);
 CREATE TABLE ptif_test0 PARTITION OF ptif_test
diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql
index 5a76f22b05..9b55a7fe5c 100644
--- a/src/test/regress/sql/partition_info.sql
+++ b/src/test/regress/sql/partition_info.sql
@@ -2,6 +2,7 @@
 -- Tests for pg_partition_tree
 --
 SELECT * FROM pg_partition_tree(NULL);
+SELECT * FROM pg_partition_tree(0);
 
 -- Test table partition trees
 CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a);


signature.asc
Description: PGP signature


RE: speeding up planning with partitions

2018-12-06 Thread Imai, Yoshikazu
Hi, Amit

Thanks for the quick modification.

On Wed, Dec 5, 2018 at 8:26 PM, Amit Langote wrote:
> > 1.
...
> > 5.
> > 0003: line 1620-1621
> > 
> > + * After creating the RelOptInfo for the given child RT index, it goes on 
> > to
> > + * initialize some of its fields base on the parent RelOptInfo.
> > 
> > s/fields base on/fields based on/
> 
> Fixed all of 1-5.

Thanks for fixing.

> > 6.
> > parsenodes.h
> >  906  *inh is true for relation references that should be expanded to 
> > include
> >  907  *inheritance children, if the rel has any.  This *must* be false 
> > for
> >  908  *RTEs other than RTE_RELATION entries.
> > 
> > I think inh can become true now even if RTEKind equals RTE_SUBQUERY, so 
> > latter
> > sentence need to be modified.
> 
> 
> 
> Seems like an existing comment bug.  Why don't you send a patch as you
> discovered it? :)

Thanks, I am pleased with your proposal. I'll post it as a small fix of the 
comment.

> > 7.
> > 0005: line 109-115
... 
> Un-pruned partitions may become dummy due to contradictory constraints or
> constraint exclusion using normal CHECK constraints later and whether it's
> dummy is checked properly by functions that iterate over live_parts.

Ah, I understand partitions are eliminated contradictory constraints or
constraint exclusion, both using constraints.

> Attached updated patches.  I have a few other changes in mind to make to
> 0001 such that the range table in each child's version of Query contains
> only that child table in place of the original target relation, instead of
> *all* child tables which is the current behavior.  The current behavior
> makes range_table_mutator a big bottleneck when the number of un-pruned
> target children is large.  But I'm saving it for the next week so that I

OK. I will continue the review of 0001 before/after your supplying of next
patch with keeping those in mind.

> can prepare for the PGConf.ASIA that's starting on Monday next week.  See
> you there. :)

Yeah, see you there. :)


--
Yoshikazu Imai



Re: Hint and detail punctuation

2018-12-06 Thread Michael Paquier
On Fri, Dec 07, 2018 at 01:27:52AM +0100, Daniel Gustafsson wrote:
> Oh..  sorry about that, I should’ve known better.  Thanks for tidying up!

No problem.  Thanks for the report and for caring on the matter.  So do
I.
--
Michael


signature.asc
Description: PGP signature


Re: Hint and detail punctuation

2018-12-06 Thread Daniel Gustafsson
> On 6 Dec 2018, at 23:51, Michael Paquier  wrote:
> 
> On Thu, Dec 06, 2018 at 09:44:22AM -0300, Alvaro Herrera wrote:
>> LGTM.
> 
> Thanks Álvaro for the additional lookup.  I have committed the patch,
> after updating the regression test output as this stuff was forgotten,
> but that was easy enough to fix.

Oh..  sorry about that, I should’ve known better.  Thanks for tidying up!

cheers ./daniel


Re: don't create storage when unnecessary

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 06:55:52PM -0300, Alvaro Herrera wrote:
> Some time ago, after partitioned indexes had been pushed, I realized
> that even though I didn't want them to have relfilenodes, they did.  And
> looking closer I noticed that *a lot* of relation kinds that didn't need
> relfilenodes, had them anyway.
> 
> This patch fixes that; if no relfilenode is needed, it's not created.
> 
> I didn't verify pg_upgrade behavior across this commit.  Maybe something
> needs tweaking there.
>
> PS: I think it'd be worth following up with this ...
> https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=avcyh...@mail.gmail.com

A macro makes sense to control that.  Now I have to admit that I don't
like your solution.  Wouldn't it be cleaner to assign InvalidOid to
relfilenode in such cases?  The callers of heap_create would need to be
made smarter when they now pass down a relfilenode (looking at you,
DefineIndex!), but that seems way more consistent to me.  Some tests
would also be welcome.
--
Michael


signature.asc
Description: PGP signature


Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-12-06 Thread Bossart, Nathan
On 12/6/18, 4:54 PM, "Michael Paquier"  wrote:
> On Thu, Dec 06, 2018 at 02:43:35PM +0900, Michael Paquier wrote:
>> Why?  A WARNING would be logged if the first unlink() fails, and
>> another, different WARNING would be logged if the subsequent fsync
>> fails.  It looks enough to me to make a distinction between both.  Now,
>> you may have a point in the fact that we could also live with only using
>> unlink() for this code path, as even on repetitive crashes this would
>> take care of removing orphan archive status files consistently.
>
> After sleeping on that, using plain unlink() makes indeed the most
> sense.  Any objections if I move on with that, adding a proper comment
> explaining the choice?  I don't plan to finish wrapping this patch today
> but Monday my time anyway.

That seems reasonable to me.

Nathan



Re: amcheck verification for GiST

2018-12-06 Thread Peter Geoghegan
On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin  wrote:
> (0001-GiST-verification-function-for-amcheck-v2.patch)

Thanks for working on this. Some feedback:

* You do this:

> +/* Check of an internal page. Hold locks on two pages at a time 
> (parent+child). */

This isn't consistent with what you do within verify_nbtree.c, which
deliberately avoids ever holding more than a single buffer lock at a
time, on general principle. That isn't necessarily a reason why you
have to do the same, but it's not clear why you do things that way.
Why isn't it enough to have a ShareLock on the relation? Maybe this is
a sign that it would be a good idea to always operate on a palloc()'d
copy of the page, by introducing something equivalent to
palloc_btree_page(). (That would also be an opportunity to do very
basic checks on every page.)

* You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around.
Certainly, there should be one at the top of the main loop.

* Maybe gist_index_check() should be called gist_index_parent_check(),
since it's rather like the existing verification function
bt_index_parent_check().

* Alternatively, you could find a way to make your function only need
an AccessShareLock -- that would make gist_index_check() an
appropriate name. That would probably require careful thought about
VACUUM.

* Why is it okay to do this?:

> +   if (GistTupleIsInvalid(idxtuple))
> +   ereport(LOG,
> +   (errmsg("index \"%s\" contains an inner tuple marked as 
> invalid",
> +   RelationGetRelationName(rel)),
> +errdetail("This is caused by an incomplete page split at 
> crash recovery before upgrading to PostgreSQL 9.1."),
> +errhint("Please REINDEX it.")));

You should probably mention the gistdoinsert() precedent for this.

* Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as
possible, adding almost any check that I can think of, provided it
hasn't got very high overhead. Note that gistcheckpage() doesn't do
this for you.

* Should we be concerned about the memory used by pushStackIfSplited()?

* How about a cross-check between IndexTupleSize() and
ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that
we have this redundancy, which wastes space, but we do, so we might as
well get some small benefit from it.

--
Peter Geoghegan



Re: psql display of foreign keys

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 01:26:30PM -0300, Alvaro Herrera wrote:
> That means I can just get pg_partition_root() done first, and try to
> write the queries using that instead of WITH RECURSIVE.

FWIW, I was just writing a patch about this one, so I was going to spawn
a new thread today :)

Let's definitely avoid WITH RECURSIVE if we can.
--
Michael


signature.asc
Description: PGP signature


Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 02:43:35PM +0900, Michael Paquier wrote:
> Why?  A WARNING would be logged if the first unlink() fails, and
> another, different WARNING would be logged if the subsequent fsync
> fails.  It looks enough to me to make a distinction between both.  Now,
> you may have a point in the fact that we could also live with only using
> unlink() for this code path, as even on repetitive crashes this would
> take care of removing orphan archive status files consistently.

After sleeping on that, using plain unlink() makes indeed the most
sense.  Any objections if I move on with that, adding a proper comment
explaining the choice?  I don't plan to finish wrapping this patch today
but Monday my time anyway.
--
Michael


signature.asc
Description: PGP signature


Re: Hint and detail punctuation

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 09:44:22AM -0300, Alvaro Herrera wrote:
> LGTM.

Thanks Álvaro for the additional lookup.  I have committed the patch,
after updating the regression test output as this stuff was forgotten,
but that was easy enough to fix.
--
Michael


signature.asc
Description: PGP signature


Re: don't create storage when unnecessary

2018-12-06 Thread Andres Freund
Hi,

On 2018-12-06 18:55:52 -0300, Alvaro Herrera wrote:
> Some time ago, after partitioned indexes had been pushed, I realized
> that even though I didn't want them to have relfilenodes, they did.  And
> looking closer I noticed that *a lot* of relation kinds that didn't need
> relfilenodes, had them anyway.
> 
> This patch fixes that; if no relfilenode is needed, it's not created.
> 
> I didn't verify pg_upgrade behavior across this commit.  Maybe something
> needs tweaking there.

Hm, that generally sounds like a good plan. Could we back this up with
tests in misc_sanity.sql or such?

- Andres



[PATCH] Allow anonymous rowtypes in function return column definition

2018-12-06 Thread Elvis Pranskevichus
Currently, the following query

SELECT q.b = row(2)
FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record);

would fail with

ERROR:  column "b" has pseudo-type record

This is due to CheckAttributeNamesTypes() being used on a function
coldeflist as if it was a real relation definition.  But in the context
of a query there seems to be no harm in allowing this, as other ways of
manipulating anonymous rowtypes work well, e.g.:

SELECT (ARRAY[ROW(1, ROW(2))])[1];



Elvis
>From 873ecd6b31abc28c787f398d78ba2511c6e712a2 Mon Sep 17 00:00:00 2001
From: Elvis Pranskevichus 
Date: Thu, 6 Dec 2018 17:16:28 -0500
Subject: [PATCH] Allow anonymous rowtypes in function return column definition

Currently, the following query

SELECT q.b = row(2)
FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record);

would fail with

ERROR:  column "b" has pseudo-type record

This is due to CheckAttributeNamesTypes() being used on a function
coldeflist as if it was a real relation definition.  But in the context
of a query there seems to be no harm in allowing this, as other ways of
manipulating anonymous rowtypes work well, e.g.:

SELECT (ARRAY[ROW(1, ROW(2))])[1];
---
 src/backend/catalog/heap.c | 23 +++
 src/backend/catalog/index.c|  2 +-
 src/backend/commands/tablecmds.c   |  4 ++--
 src/backend/parser/parse_relation.c|  2 +-
 src/include/catalog/heap.h |  6 --
 src/test/regress/expected/rowtypes.out |  7 +++
 src/test/regress/sql/rowtypes.sql  |  2 ++
 7 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 8c52a1543d..ab9cb600f1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -412,7 +412,8 @@ heap_create(const char *relname,
  */
 void
 CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
-		 bool allow_system_table_mods)
+		 bool allow_system_table_mods,
+		 bool allow_anonymous_records)
 {
 	int			i;
 	int			j;
@@ -471,7 +472,8 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
 		   TupleDescAttr(tupdesc, i)->atttypid,
 		   TupleDescAttr(tupdesc, i)->attcollation,
 		   NIL, /* assume we're creating a new rowtype */
-		   allow_system_table_mods);
+		   allow_system_table_mods,
+		   allow_anonymous_records);
 	}
 }
 
@@ -494,7 +496,8 @@ void
 CheckAttributeType(const char *attname,
    Oid atttypid, Oid attcollation,
    List *containing_rowtypes,
-   bool allow_system_table_mods)
+   bool allow_system_table_mods,
+   bool allow_anonymous_records)
 {
 	char		att_typtype = get_typtype(atttypid);
 	Oid			att_typelem;
@@ -507,7 +510,8 @@ CheckAttributeType(const char *attname,
 		 * catalogs (this allows creating pg_statistic and cloning it during
 		 * VACUUM FULL)
 		 */
-		if (atttypid != ANYARRAYOID || !allow_system_table_mods)
+		if (!((atttypid == ANYARRAYOID && allow_system_table_mods) ||
+(atttypid == RECORDOID && allow_anonymous_records)))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 	 errmsg("column \"%s\" has pseudo-type %s",
@@ -520,7 +524,8 @@ CheckAttributeType(const char *attname,
 		 */
 		CheckAttributeType(attname, getBaseType(atttypid), attcollation,
 		   containing_rowtypes,
-		   allow_system_table_mods);
+		   allow_system_table_mods,
+		   allow_anonymous_records);
 	}
 	else if (att_typtype == TYPTYPE_COMPOSITE)
 	{
@@ -558,7 +563,8 @@ CheckAttributeType(const char *attname,
 			CheckAttributeType(NameStr(attr->attname),
 			   attr->atttypid, attr->attcollation,
 			   containing_rowtypes,
-			   allow_system_table_mods);
+			   allow_system_table_mods,
+			   allow_anonymous_records);
 		}
 
 		relation_close(relation, AccessShareLock);
@@ -572,7 +578,8 @@ CheckAttributeType(const char *attname,
 		 */
 		CheckAttributeType(attname, att_typelem, attcollation,
 		   containing_rowtypes,
-		   allow_system_table_mods);
+		   allow_system_table_mods,
+		   allow_anonymous_records);
 	}
 
 	/*
@@ -1063,7 +1070,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	Assert(IsNormalProcessingMode() || IsBootstrapProcessingMode());
 
-	CheckAttributeNamesTypes(tupdesc, relkind, allow_system_table_mods);
+	CheckAttributeNamesTypes(tupdesc, relkind, allow_system_table_mods, false);
 
 	/*
 	 * This would fail later on anyway, if the relation already exists.  But
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 44625a507b..ac5a285be7 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -435,7 +435,7 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 */
 			CheckAttributeType(NameStr(to->attname),
 			   to->atttypid, to->attcollation,
-			   NIL, false);
+			   NIL, false, false);
 		}
 
 		/*
diff --git a/src/backend/commands/tablecmds.c 

rewrite ExecPartitionCheckEmitError

2018-12-06 Thread Alvaro Herrera
Just on cleanliness grounds, I propose to rewrite the function in
$SUBJECT.  I came across this while reviewing some already-committed
patch for partition pruning, and it's been sitting in my laptop ever
since.

I think the current coding is too convoluted and hard to follow.  The
patch makes it much simpler (IMO).

-- 
Álvaro Herrera
>From 8c8ac713ac168b2678925e58cdfc83694a8fa5b9 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 14 Nov 2018 20:57:19 -0300
Subject: [PATCH] rewrite ExecPartitionCheckEmitError

---
 src/backend/executor/execMain.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d83d296d82..78c8ce4935 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1837,11 +1837,9 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 			TupleTableSlot *slot,
 			EState *estate)
 {
-	Relation	rel = resultRelInfo->ri_RelationDesc;
-	Relation	orig_rel = rel;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
-	char	   *val_desc;
-	Bitmapset  *modifiedCols;
+	Oid			root_relid;
+	TupleDesc	tupdesc;
+	char	   *valdsc;
 	Bitmapset  *insertedCols;
 	Bitmapset  *updatedCols;
 
@@ -1851,11 +1849,13 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 	 */
 	if (resultRelInfo->ri_PartitionRoot)
 	{
-		TupleDesc	old_tupdesc = RelationGetDescr(rel);
+		TupleDesc	old_tupdesc;
 		AttrNumber *map;
 
-		rel = resultRelInfo->ri_PartitionRoot;
-		tupdesc = RelationGetDescr(rel);
+		root_relid = RelationGetRelid(resultRelInfo->ri_PartitionRoot);
+
+		old_tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+		tupdesc = RelationGetDescr(resultRelInfo->ri_PartitionRoot);
 		/* a reverse map */
 		map = convert_tuples_by_name_map_if_req(old_tupdesc, tupdesc,
 gettext_noop("could not convert row type"));
@@ -1868,20 +1868,24 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 			slot = execute_attr_map_slot(map, slot,
 		 MakeTupleTableSlot(tupdesc, ));
 	}
+	else
+	{
+		root_relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+		tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+	}
 
 	insertedCols = GetInsertedColumns(resultRelInfo, estate);
 	updatedCols = GetUpdatedColumns(resultRelInfo, estate);
-	modifiedCols = bms_union(insertedCols, updatedCols);
-	val_desc = ExecBuildSlotValueDescription(RelationGetRelid(rel),
-			 slot,
-			 tupdesc,
-			 modifiedCols,
-			 64);
+	valdsc = ExecBuildSlotValueDescription(root_relid,
+		   slot,
+		   tupdesc,
+		   bms_union(insertedCols, updatedCols),
+		   64);
 	ereport(ERROR,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("new row for relation \"%s\" violates partition constraint",
-	RelationGetRelationName(orig_rel)),
-			 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0));
+	RelationGetRelationName(resultRelInfo->ri_RelationDesc)),
+			 valdsc ? errdetail("Failing row contains %s.", valdsc) : 0));
 }
 
 /*
-- 
2.11.0



Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-12-06 Thread Peter Geoghegan
On Sun, Dec 2, 2018 at 4:21 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > About the status of the patch, to me it should be RWF. It's been
> > moved to the next CF several times with no progress besides rebases.
>
> Let me disagree. Judging from the commentaries in this discussion it could be
> significant and useful feature, and the author is trying to keep this patch
> uptodate. The lack of reviews could be due other reasons than desirability of
> the patch (as well as as for many other interesting proposals in hackers).

+1. I, for one, care about this feature. We're all very busy, but I
don't want to see it die.

-- 
Peter Geoghegan



don't create storage when unnecessary

2018-12-06 Thread Alvaro Herrera
Some time ago, after partitioned indexes had been pushed, I realized
that even though I didn't want them to have relfilenodes, they did.  And
looking closer I noticed that *a lot* of relation kinds that didn't need
relfilenodes, had them anyway.

This patch fixes that; if no relfilenode is needed, it's not created.

I didn't verify pg_upgrade behavior across this commit.  Maybe something
needs tweaking there.


PS: I think it'd be worth following up with this ...
https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=avcyh...@mail.gmail.com

-- 
Álvaro Herrera
>From 25d80040ffe2fa4fa173b4db7a6e91461542fef3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 9 Aug 2018 16:18:08 -0400
Subject: [PATCH] don't create storage when not necessary

---
 src/backend/catalog/heap.c |  2 +-
 src/backend/utils/cache/relcache.c |  8 
 src/include/utils/rel.h| 10 ++
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 11debaa780..6f73ff4fcf 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -372,7 +372,7 @@ heap_create(const char *relname,
 	 */
 	if (OidIsValid(relfilenode))
 		create_storage = false;
-	else
+	else if (create_storage)
 		relfilenode = relid;
 
 	/*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index c3071db1cd..76a4cc65be 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1253,6 +1253,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	/* these relations kinds have no storage */
+	if (relation->rd_rel->relkind == RELKIND_VIEW ||
+		relation->rd_rel->relkind == RELKIND_COMPOSITE_TYPE ||
+		relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+		relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+		relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+		return;
+
 	if (relation->rd_rel->reltablespace)
 		relation->rd_node.spcNode = relation->rd_rel->reltablespace;
 	else
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 2217081dcc..6ac40fd84c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -451,12 +451,14 @@ typedef struct ViewOptions
 /*
  * RelationIsMapped
  *		True if the relation uses the relfilenode map.
- *
- * NB: this is only meaningful for relkinds that have storage, else it
- * will misleadingly say "true".
  */
 #define RelationIsMapped(relation) \
-	((relation)->rd_rel->relfilenode == InvalidOid)
+	(((relation)->rd_rel->relfilenode == InvalidOid) && \
+	 ((relation)->rd_rel->relkind == RELKIND_RELATION || \
+	  (relation)->rd_rel->relkind == RELKIND_INDEX ||  \
+	  (relation)->rd_rel->relkind == RELKIND_SEQUENCE || \
+	  (relation)->rd_rel->relkind == RELKIND_TOASTVALUE || \
+	  (relation)->rd_rel->relkind == RELKIND_MATVIEW))
 
 /*
  * RelationOpenSmgr
-- 
2.11.0



Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-06 Thread Peter Geoghegan
On Thu, Dec 6, 2018 at 12:51 PM Alexander Korotkov  wrote:
> So, algorithm introduced by 218f51584d5 is broken.  It tries to
> guarantee that there are no inserters in the subtree by placing
> cleanup lock to subtree root, assuming that inserter holds pins on the
> path from root to leaf.  But due to concurrent splits of internal
> pages the pins held can be not relevant to actual path.  I don't see
> the way to fix this easily.  So, I think we should revert it from back
> branches and try to reimplement that in master.
>
> However, I'd like to note that 218f51584d5 introduces two changes:
> 1) Cleanup locking only if there pages to delete
> 2) Cleanup locking only subtree root
> The 2nd one is broken.  But the 1st one seems still good for me and
> useful, because in vast majority of cases vacuum doesn't delete any
> index pages.  So, I propose to revert 218f51584d5, but leave there
> logic, which locks root for cleanup only once there are pages to
> delete.  Any thoughts?

Can you post a patch that just removes the 2nd part, leaving the
still-correct first part?

Your proposal sounds reasonable, but I'd like to see what you have in
mind in detail before commenting.

Thanks
-- 
Peter Geoghegan



Re: Compressed TOAST Slicing

2018-12-06 Thread Paul Ramsey
On Sun, Dec 2, 2018 at 7:03 AM Rafia Sabih 
wrote:
>
> The idea looks good and believing your performance evaluation it seems
> like a practical one too.

Thank you kindly for the review!

> A comment explaining how this check differs for is_slice case would be
helpful.
> Looks like PG indentation is not followed here for n.

I have attached updated patches that add the comment and adhere to the Pg
variable declaration indentation styles,
ATB!
P

--
Paul Ramsey
http://crunchydata.com


compressed-datum-slicing-left-20190103a.patch
Description: Binary data


compressed-datum-slicing-20190103a.patch
Description: Binary data


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-06 Thread Alexander Korotkov
On Thu, Dec 6, 2018 at 10:56 PM Alexander Korotkov  wrote:
> On Tue, Dec 4, 2018 at 8:01 PM Andres Freund  wrote:
> > On 2018-11-10 17:42:16 -0800, Peter Geoghegan wrote:
> > > On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan  wrote:
> > > > Teodor: Do you think that the issue is fixable? It looks like there
> > > > are serious issues with the design of 218f51584d5 to me. I don't think
> > > > the general "there can't be any inserters at this subtree" thing works
> > > > given that we have to couple buffer locks when moving right for other
> > > > reasons. We call ginStepRight() within ginFinishSplit(), for reasons
> > > > that date back to bug fix commit ac4ab97e from 2013 -- that detail is
> > > > probably important, because it seems to be what breaks the subtree
> > > > design (we don't delete in two phases or anything in GIN).
> > >
> > > Ping?
> > >
> > > This is a thorny problem, and I'd like to get your input soon. I
> > > suspect that reverting 218f51584d5 may be the ultimate outcome, even
> > > though it's a v10 feature.
> >
> > Teodor, any chance for a response here?  It's not OK to commit something
> > and then just not respond to bug-reports, especially when they're well
> > diagnose and clearly point towards issues in a commit of yours.
>
> Unfortunately, Teodor is unreachable at least till next week.
>
> > Alexander, CCing you, perhaps you could point the thread out to Teodor?
>
> I'll take a look at this issue.

I've run through the thread.

So, algorithm introduced by 218f51584d5 is broken.  It tries to
guarantee that there are no inserters in the subtree by placing
cleanup lock to subtree root, assuming that inserter holds pins on the
path from root to leaf.  But due to concurrent splits of internal
pages the pins held can be not relevant to actual path.  I don't see
the way to fix this easily.  So, I think we should revert it from back
branches and try to reimplement that in master.

However, I'd like to note that 218f51584d5 introduces two changes:
1) Cleanup locking only if there pages to delete
2) Cleanup locking only subtree root
The 2nd one is broken.  But the 1st one seems still good for me and
useful, because in vast majority of cases vacuum doesn't delete any
index pages.  So, I propose to revert 218f51584d5, but leave there
logic, which locks root for cleanup only once there are pages to
delete.  Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: minor fix in CancelVirtualTransaction

2018-12-06 Thread Alvaro Herrera
Same patch, commit message added.  Adding to commitfest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2cbe6fe5ac7617e424a63b820a6a2c83b712bab5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 6 Dec 2018 17:30:58 -0300
Subject: [PATCH] Optimize CancelVirtualTransaction a little bit

When scanning the list of virtual transactions looking for one
particular vxid, we cancel the transaction *and* break out of the loop
only if both backendId and localTransactionId matches us.  The canceling
part is okay, but limiting the break to that case is pointless: if we
found the correct backendId, there cannot be any other entry with the
same backendId.  Rework the loop so that we break out of it if backendId
matches even if the localTransactionId part doesn't.

Discussion: https://postgr.es/m/20180629170512.d6kof6l6uu3qqpd6@alvherre.pgsql
---
 src/backend/storage/ipc/procarray.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index dc7e875680..5ed588d9be 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2683,18 +2683,20 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
 
 		GET_VXID_FROM_PGPROC(procvxid, *proc);
 
-		if (procvxid.backendId == vxid.backendId &&
-			procvxid.localTransactionId == vxid.localTransactionId)
+		if (procvxid.backendId == vxid.backendId)
 		{
-			proc->recoveryConflictPending = true;
-			pid = proc->pid;
-			if (pid != 0)
+			if (procvxid.localTransactionId == vxid.localTransactionId)
 			{
-/*
- * Kill the pid if it's still here. If not, that's what we
- * wanted so ignore any errors.
- */
-(void) SendProcSignal(pid, sigmode, vxid.backendId);
+proc->recoveryConflictPending = true;
+pid = proc->pid;
+if (pid != 0)
+{
+	/*
+	 * Kill the pid if it's still here. If not, that's what we
+	 * wanted so ignore any errors.
+	 */
+	(void) SendProcSignal(pid, sigmode, vxid.backendId);
+}
 			}
 			break;
 		}
-- 
2.11.0



Re: Alter table documentation page (again)

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Lætitia Avrot wrote:

> > I'd rename the action in ON DELETE/UPDATE to referential_action, both in
> > alter_table and in create_table (the latter just for consistency).
> 
> I love that "option 3" !  So clever!

Thanks :-)  I checked the SQL standard after sending that email, and
indeed it calls those elements .  Go figure.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Alter table documentation page (again)

2018-12-06 Thread Lætitia Avrot
>
>
> I'd rename the action in ON DELETE/UPDATE to referential_action, both in
> alter_table and in create_table (the latter just for consistency).
>

I love that "option 3" !  So clever!


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-06 Thread Alexander Korotkov
On Tue, Dec 4, 2018 at 8:01 PM Andres Freund  wrote:
> On 2018-11-10 17:42:16 -0800, Peter Geoghegan wrote:
> > On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan  wrote:
> > > Teodor: Do you think that the issue is fixable? It looks like there
> > > are serious issues with the design of 218f51584d5 to me. I don't think
> > > the general "there can't be any inserters at this subtree" thing works
> > > given that we have to couple buffer locks when moving right for other
> > > reasons. We call ginStepRight() within ginFinishSplit(), for reasons
> > > that date back to bug fix commit ac4ab97e from 2013 -- that detail is
> > > probably important, because it seems to be what breaks the subtree
> > > design (we don't delete in two phases or anything in GIN).
> >
> > Ping?
> >
> > This is a thorny problem, and I'd like to get your input soon. I
> > suspect that reverting 218f51584d5 may be the ultimate outcome, even
> > though it's a v10 feature.
>
> Teodor, any chance for a response here?  It's not OK to commit something
> and then just not respond to bug-reports, especially when they're well
> diagnose and clearly point towards issues in a commit of yours.

Unfortunately, Teodor is unreachable at least till next week.

> Alexander, CCing you, perhaps you could point the thread out to Teodor?

I'll take a look at this issue.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Deleting no-op CoerceToDomains from plan trees

2018-12-06 Thread Tom Lane
Attached is a complete patch for one of the tasks discussed in [1],
namely dropping CoerceToDomain nodes from plan trees if their domains
have no constraints that need to be checked.

After I fleshed out the original patch to include sinval signaling for
changes of domain constraints, I was surprised to find that the existing
test case in domain.sql still failed.  The reason turned out to be that
the CoerceToDomain node was not being done as part of a normal plpgsql
expression, but within a cast expression cached by plpgsql's
get_cast_hashentry() function ... and that code has no provision for
cache flushes.

I concluded that the best way to fix that was to create new functionality
in plancache.c for caching expression trees with appropriate invalidation
support, so this patch includes that.  (Conceivably that should be a
separate patch, but it didn't really seem worth the trouble.)

As patched, get_cast_hashentry() is the only user of the new plancache
code.  I looked around at other callers of expression_planner() to see
if anything else was caching the result for more than one query.
I found such a caller in typcache.c's domain constraint caching, but
as discussed in [2], it's not really clear that there's any point in
changing that code; I just added a comment about the issue instead.

(I also noted some callers in relation partitioning code, which
I'm a tad suspicious of, mainly because it's not apparent to me why
that code should be executing arbitrary expressions in the first
place.  But if there's anything broken there, it seems like material
for a separate discussion.)

Some other notes for review:

* In typecmds.c, I fixed some inconsistency about whether the various
subcommands of ALTER DOMAIN released catalog locks or not.

* I used a dlist to thread CachedExpressions together in plancache.c,
and failed to resist the temptation to change CachedPlanSources to
use a dlist as well.  That eliminates a potential performance problem
in DropCachedPlan, though I'm not sure how important that is.

regards, tom lane

[1] https://www.postgresql.org/message-id/5978.1544030...@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/12539.1544107...@sss.pgh.pa.us

diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 1ffc823..544f423 100644
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
***
*** 62,67 
--- 62,68 
  #include "parser/parse_type.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
+ #include "utils/inval.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
*** AlterDomainDefault(List *names, Node *de
*** 2297,2303 
  	ObjectAddressSet(address, TypeRelationId, domainoid);
  
  	/* Clean up */
! 	heap_close(rel, NoLock);
  	heap_freetuple(newtuple);
  
  	return address;
--- 2298,2304 
  	ObjectAddressSet(address, TypeRelationId, domainoid);
  
  	/* Clean up */
! 	heap_close(rel, RowExclusiveLock);
  	heap_freetuple(newtuple);
  
  	return address;
*** AlterDomainDropConstraint(List *names, c
*** 2494,2501 
  	systable_endscan(conscan);
  	heap_close(conrel, RowExclusiveLock);
  
- 	heap_close(rel, NoLock);
- 
  	if (!found)
  	{
  		if (!missing_ok)
--- 2495,2500 
*** AlterDomainDropConstraint(List *names, c
*** 2509,2516 
--- 2508,2525 
  			constrName, TypeNameToString(typename;
  	}
  
+ 	/*
+ 	 * We must send out an sinval message for the domain, to ensure that any
+ 	 * dependent plans get rebuilt.  Since this command doesn't change the
+ 	 * domain's pg_type row, that won't happen automatically; do it manually.
+ 	 */
+ 	CacheInvalidateHeapTuple(rel, tup, NULL);
+ 
  	ObjectAddressSet(address, TypeRelationId, domainoid);
  
+ 	/* Clean up */
+ 	heap_close(rel, RowExclusiveLock);
+ 
  	return address;
  }
  
*** AlterDomainAddConstraint(List *names, No
*** 2615,2620 
--- 2624,2636 
  	if (!constr->skip_validation)
  		validateDomainConstraint(domainoid, ccbin);
  
+ 	/*
+ 	 * We must send out an sinval message for the domain, to ensure that any
+ 	 * dependent plans get rebuilt.  Since this command doesn't change the
+ 	 * domain's pg_type row, that won't happen automatically; do it manually.
+ 	 */
+ 	CacheInvalidateHeapTuple(typrel, tup, NULL);
+ 
  	ObjectAddressSet(address, TypeRelationId, domainoid);
  
  	/* Clean up */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c729a99..b645648 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** adjust_paths_for_srfs(PlannerInfo *root,
*** 5923,5932 
   * side-effect that is useful when the expression will get evaluated more than
   * once.  Also, we must fix operator function IDs.
   *
   * Note: this must not make any damaging changes to the passed-in expression
   * tree.  (It would actually be okay to 

How do I get altered object from GRANT event trigger?

2018-12-06 Thread Jeremy Finzel
I noticed that there is no information for GRANT in
pg_event_trigger_ddl_commands().  I am trying to determine which
table/schema is being altered.  Is there any way to do this either with
built-in functionality, or extending the current code in event_trigger.c?
Is the reason for this limitation that GRANT does not perform the same kind
of catalog changes that other commands do?

Thanks!
Jeremy


Re: Alter table documentation page (again)

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Lætitia Avrot wrote:

> Any thought ? Which option do you prefer ? I can write the patch, but I'd
> like to know what you think about that.

I'd rename the action in ON DELETE/UPDATE to referential_action, both in
alter_table and in create_table (the latter just for consistency).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: slow queries over information schema.tables

2018-12-06 Thread Tom Lane
Robert Haas  writes:
> On Thu, Dec 6, 2018 at 12:03 PM Tom Lane  wrote:
>> [ further experimentation... ]  It looks like if you prepare
>> a query and then just execute it repeatedly in one transaction,
>> you'd not reach AIM (as long as you were getting generic plans).
>> Possibly that's a gap worth closing.

> If we called it at the start of every query, couldn't we dispense with
> the call in relation_openrv_extended()?

No.  You need to do AIM *after* obtaining the lock, else you still
have the race condition that you can execute a query on a table
without being aware of recent DDL on it.

What we could possibly do to close the gap cited above is to have
plancache.c's CheckCachedPlan force an AIM call if it notices that
the plan it wants to re-use has a non-empty PlanInvalItems list.
If it does not, then there is nothing that AIM would cause invalidation
for anyway.  This still leaves us with a query-duration-sized race
condition window for DDL on functions and domains, but not any larger
than that.

regards, tom lane



Alter table documentation page (again)

2018-12-06 Thread Lætitia Avrot
Hi,

I was trying to answer a former colleague question about postgres' default
behaviour when deleting or updating when she pointed out that Postgres
alter table documentation page used twice the 'action' keywords for two
different things.

And indeed she is right :
- it's used to described any action actionable with ALTER TABLE statement
- and also to reference action taken on update or delete

Here are my thoughts about that conflict :
- We should keep the action keyword for on update or on delete action as
it's what's done on the CREATE TABLE documentation page.
- Option 1: We could either categorized the "actions" we can do with ALTER
TABLE (as column_action, trigger_action, constraint_action, rule_action,
other_action), but I'm not a big fan of "other_action" and I can't think of
another name
- Option 2: Or we could just rename the first action as "alter-table_action"

Any thought ? Which option do you prefer ? I can write the patch, but I'd
like to know what you think about that.

Anyway, thanks to Brigitte Blanc-Lafay tohave pointed this out! :-)

Cheers,

Lætitia
-- 
*Think! Do you really need to print this email ? *
*There is no Planet B.*


Re: proposal: plpgsql pragma statement

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 18:17 odesílatel Robert Haas 
napsal:

> On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule 
> wrote:
> > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This
> is not runtime statement - the information from this command will be
> assigned to related object - function, block, command at parser time.
>
> That's sensible, but the syntax you were proposing didn't look like it
> was related to a specific statement.  I was objecting to the idea that
> PRAGMA whatever; should be construed as an annotation of,
> specifically, the following statement.
>

please, can you propose, some what you like?

For my purpose I can imagine PRAGMA on function level with same syntax like
PL/SQL - I need to push somewhere some information that I can use for
plpgsql_check to protect users against false alarms. The locality in this
moment is not too important for me. But I prefer solution that doesn't
looks too strange, and is possible just with change plpgsql parser.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: WIP: Avoid creation of the free space map for small tables

2018-12-06 Thread John Naylor
On 12/3/18, Amit Kapila  wrote:
> On Mon, Dec 3, 2018 at 11:15 AM John Naylor  wrote:
>> Per your recent comment, we no longer check relation size if we waited
>> on a relation extension lock, so this is essentially a reversion to an
>> earlier version.
>>
>
> fsm_local_set is being called from RecordAndGetPageWithFreeSpace and
> GetPageWithFreeSpace whereas the change we have discussed was specific
> to GetPageWithFreeSpace, so not sure if we need any change in
> fsm_local_set.

Not needed, but I assumed wrongly you'd think it unclear otherwise.
I've now restored the generality and updated the comments to be closer
to v8.

> It would be good if you add few comments atop functions
> GetPageWithFreeSpace, RecordAndGetPageWithFreeSpace and
> RecordPageWithFreeSpace about their interaction with local map.

Done. Also additional minor comment editing.

I've added an additional regression test for finding the right block
and removed a test I thought was redundant. I've kept the test file in
its own schedule.

-John Naylor
From 7494ff89be7ad4e07f277f3bcc6e866c5527ce45 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 6 Dec 2018 10:49:03 -0600
Subject: [PATCH v10 1/2] Avoid creation of the free space map for small
 tables.

The FSM isn't created if the heap has 4 blocks or fewer. If the last
known target block has insufficient space, try every block before extending
the heap.

If a heap with a FSM is truncated back to below the threshold, the FSM
stays around and can be used as usual.
---
 contrib/pageinspect/expected/page.out |  77 +++
 contrib/pageinspect/sql/page.sql  |  33 +--
 doc/src/sgml/storage.sgml |  13 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|  10 +-
 src/backend/access/heap/hio.c |  47 ++--
 src/backend/access/transam/xact.c |  14 ++
 src/backend/commands/vacuumlazy.c |  17 +-
 src/backend/storage/freespace/README  |  11 +-
 src/backend/storage/freespace/freespace.c | 251 +-
 src/backend/storage/freespace/indexfsm.c  |   6 +-
 src/include/storage/freespace.h   |   9 +-
 src/test/regress/expected/fsm.out |  62 ++
 src/test/regress/parallel_schedule|   5 +
 src/test/regress/sql/fsm.sql  |  46 
 15 files changed, 498 insertions(+), 105 deletions(-)
 create mode 100644 src/test/regress/expected/fsm.out
 create mode 100644 src/test/regress/sql/fsm.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test1', 'xxx', 0));
+SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
+SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
+ fsm_page_contents 

Re: proposal: plpgsql pragma statement

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule  wrote:
> My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This is 
> not runtime statement - the information from this command will be assigned to 
> related object - function, block, command at parser time.

That's sensible, but the syntax you were proposing didn't look like it
was related to a specific statement.  I was objecting to the idea that
PRAGMA whatever; should be construed as an annotation of,
specifically, the following statement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 18:05 odesílatel Pavel Stehule 
napsal:

>
>
> čt 6. 12. 2018 v 17:57 odesílatel Robert Haas 
> napsal:
>
>> On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris 
>> wrote:
>> > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it
>> affecting runtime?
>>
>> Well, I don't know what Ada does with PRAGMA exactly, but look at
>> these examples from Oracle:
>>
>> http://psoug.org/definition/pragma.htm
>>
>> You wouldn't *execute* those at runtime, but at least for some of
>> them, the runtime behavior would depend on whether or not they were
>> specified.  It certainly seems possible that we might want to have
>> similar things.
>>
>
> My proposal doesn't block it.
>
> The pragma in Ada has three levels - function, block, statement. I propose
> (in this moment) just statement level syntax, but I am sure, so other
> levels are possible.
>

My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This is
not runtime statement - the information from this command will be assigned
to related object - function, block, command at parser time.


> I would to have a autonomous functions or autonomous blocks too, and Ada
> syntax (same with PL/SQL) is good.
>
> Regards
>
> Pavel
>
>
>
>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>


Re: slow queries over information schema.tables

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 12:03 PM Tom Lane  wrote:
> In my testing, that still hits AIM() during parserOpenTable().

Oh, I see.  relation_openrv_extended() calls it.

> [ further experimentation... ]  It looks like if you prepare
> a query and then just execute it repeatedly in one transaction,
> you'd not reach AIM (as long as you were getting generic plans).
> Possibly that's a gap worth closing.

If we called it at the start of every query, couldn't we dispense with
the call in relation_openrv_extended()?  On net, that would actually
mean fewer calls to AcceptInvalidationMessages(), assuming you
sometimes run queries that touch multiple relations.  And the behavior
would be more predictable, too, because you'd (hopefully) have no
cases where a command failed to see the results of DDL that committed
before the command was issued.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 17:57 odesílatel Robert Haas 
napsal:

> On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris 
> wrote:
> > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it
> affecting runtime?
>
> Well, I don't know what Ada does with PRAGMA exactly, but look at
> these examples from Oracle:
>
> http://psoug.org/definition/pragma.htm
>
> You wouldn't *execute* those at runtime, but at least for some of
> them, the runtime behavior would depend on whether or not they were
> specified.  It certainly seems possible that we might want to have
> similar things.
>

My proposal doesn't block it.

The pragma in Ada has three levels - function, block, statement. I propose
(in this moment) just statement level syntax, but I am sure, so other
levels are possible.

I would to have a autonomous functions or autonomous blocks too, and Ada
syntax (same with PL/SQL) is good.

Regards

Pavel




> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: slow queries over information schema.tables

2018-12-06 Thread Tom Lane
Robert Haas  writes:
> On Thu, Dec 6, 2018 at 11:32 AM Tom Lane  wrote:
>> It's fairly hard to imagine practical cases where we'd not call
>> AcceptInvalidationMessages at least once per query, so I'm not
>> very sure what you're on about.

> Unless I'm confused, it happens any time you run a query that only
> touches tables using lockmodes previously acquired by the current
> transaction.  Like:

> BEGIN;
> some query;
> the same query again;

In my testing, that still hits AIM() during parserOpenTable().

[ further experimentation... ]  It looks like if you prepare
a query and then just execute it repeatedly in one transaction,
you'd not reach AIM (as long as you were getting generic plans).
Possibly that's a gap worth closing.

regards, tom lane



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Jonah H. Harris
Ahh. Gotcha. Makes sense.

On Thu, Dec 6, 2018 at 11:57 AM Robert Haas  wrote:

> On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris 
> wrote:
> > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it
> affecting runtime?
>
> Well, I don't know what Ada does with PRAGMA exactly, but look at
> these examples from Oracle:
>
> http://psoug.org/definition/pragma.htm
>
> You wouldn't *execute* those at runtime, but at least for some of
> them, the runtime behavior would depend on whether or not they were
> specified.  It certainly seems possible that we might want to have
> similar things.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
-- 
Jonah H. Harris


Re: [PATCH] Opclass parameters

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 11:55 AM Tom Lane  wrote:
> How about saying that you must give an opclass name if you want to
> specify options, ie the syntax is
>
> [ opclass_name [ ( options... ) ] ]
>
> I'm not necessarily wedded to that, but it seems worth throwing
> out the idea.

Agreed, that's not bad, certainly better than making OPTIONS more reserved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris  wrote:
> IIRC, PRAGMA in Ada was compile-time only. How would you foresee it affecting 
> runtime?

Well, I don't know what Ada does with PRAGMA exactly, but look at
these examples from Oracle:

http://psoug.org/definition/pragma.htm

You wouldn't *execute* those at runtime, but at least for some of
them, the runtime behavior would depend on whether or not they were
specified.  It certainly seems possible that we might want to have
similar things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Opclass parameters

2018-12-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov  wrote:
>> "opclass (options)" looks the most natural to me. But we still need some
>> keyword before the parentheses when the opclass is not specified since we
>> can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
>> in grammar.

> Are you sure?  What's the SQL syntax where there is actually a problem
> here?  CREATE INDEX requires parentheses around a non-trivial
> expression.

Well, the reason we have to require parens around nontrivial expressions
is mostly lack of forethought about making the syntax non-ambiguous :-(

> How about just OPTIONS (options) ?

That would require making OPTIONS a fully reserved word, I think,
else it's ambiguous with an opclass name.

How about saying that you must give an opclass name if you want to
specify options, ie the syntax is

[ opclass_name [ ( options... ) ] ]

I'm not necessarily wedded to that, but it seems worth throwing
out the idea.

regards, tom lane



Re: slow queries over information schema.tables

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 11:32 AM Tom Lane  wrote:
> Robert Haas  writes:
> > I'm not thrilled about depending on sinval without locking,
> > particularly given that my proposal to make sure we
> > AcceptInvalidationMessages() at least once per query was shouted down.
>
> It's fairly hard to imagine practical cases where we'd not call
> AcceptInvalidationMessages at least once per query, so I'm not
> very sure what you're on about.

Unless I'm confused, it happens any time you run a query that only
touches tables using lockmodes previously acquired by the current
transaction.  Like:

BEGIN;
some query;
the same query again;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 17:27 odesílatel Robert Haas 
napsal:

> On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule 
> wrote:
> > I wrote plpgsql_check https://github.com/okbob/plpgsql_check.
> >
> > It is working well, but because it does static analyse only, sometimes
> it can produces false alarms or it should to stop a analyse, because there
> are not necessary data.
> >
> > https://github.com/okbob/plpgsql_check/issues/36
> >
> > I see one possible solution in introduction of pragma statement with
> syntax:
> >
> >   PRAGMA keyword [content to semicolon];
> >
> > The pragma has a relation to following statement. So the issue 36 can be
> solved by pragma
> >
> > PRAGMA cmdtype CREATE;
> > EXECUTE format('CREATE TABLE xxx ...
> >
> > The PRAGMA statement does nothing in runtime. It works only in compile
> time, and add a pair of key, value to next non pragma statement. This
> information can be used by some plpgsql extensions.
> >
> > What do you think about this proposal?
>
> I think it's commandeering PRAGMA for a fairly narrow purpose.  It's
> hardly unimaginable that someone in future might want a PRAGMA that
> does change runtime behavior, or that affects something other than the
> statement which immediately follows.
>
> I don't see a big problem allowing some kind of annotation that
> plpgsql_check can easily access, and I don't even mind it being called
> PRAGMA.  But I don't think it should foreclose unrelated uses of
> PRAGMA which somebody might want to introduce in the future.
>

The most simple (and probably one possible) implementation is plpgsql
statement - because I would not to modify SQL lexer.

I am thinking so PRAGMA statement is natural due semantic and usage in Ada,
but I am opened any proposals.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [PATCH] Opclass parameters

2018-12-06 Thread Robert Haas
On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov  wrote:
> I agree that we should distinguish per-index and per-column options, but they
> can also be AM-specific and opclass-specific.

True, but an index is bound to a single AM, and a column is bound to a
single opclass which is bound to a single AM.  So I'm not very worried
about name collisions.  Can't we just tell opclass authors to pick
names that are unlikely to collide with names chose by the AM or names
that are globally reserved?  It's hard to imagine that we're ever
going to have more than a dozen or so options that could possibly
apply to a column, so splitting things up into different namespaces
seems like an unnecessary burden on the user.

> 'fastupdate' option for GIN is an example of AM-specific per-index option.
>
> ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column
> options having special SQL syntax.  "AM-class-specific" here means "specific
> only for the class of AMs that support ordering".  Now they are stored as 
> flags
> in pg_index.indoption[] but later can be moved to pg_attribute.attoptions.

Or left where they are.

> And another problem is the options with default values.  They may be not
> explicitly  specified by the user, and in this case in current implementation
> nothing will be stored in the catalog because default values can be obtained
> from the code.  But this will not allow changing of default values without
> compatibility issues. So I think it would be better to store both default and
> explicitly specified values of all opclass options, but this will require a
> major refactoring of current API.

Hmm.  I think if the default ever changes, it will require a new major
release, and we can compensate in pg_dump.  That means that a dump
taken with an old version will not preserve the options.  However,
using the pg_dump from the newest release is the recommended
procedure, and if you don't do that, you might get outright failures.
Failing to preserve an option value in the rare case that a default
was changed seems less bad than that.

> Also I have idea to define list of opclass parameters declaratively when 
> opclass
> is created using syntax like the following:
>
> CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) 
> ]
> FOR TYPE ... AS (
>  {  OPTIONS function_name ( arg_type [,...] )   /* reloptions => binary */
>   | OPERATOR ...
>  } [,...]
> )

I'm not sure exposing SQL syntax for this is a very good idea.

> "[opclass] WITH OPTIONS (options)" looks too verbose, of course.

It's not that bad.

> "[opclass] WITH (options)" looks acceptable, but it seems to conflict with
> exclusion constraints syntax ("index_elem WITH operator").

Are you sure?  The operator can't be (

But it might be confusing anyhow.

> "opclass (options)" looks the most natural to me. But we still need some
> keyword before the parentheses when the opclass is not specified since we
> can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
> in grammar.

Are you sure?  What's the SQL syntax where there is actually a problem
here?  CREATE INDEX requires parentheses around a non-trivial
expression.

How about just OPTIONS (options) ?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [WIP] CREATE SUBSCRIPTION with FOR TABLES clause (table filter)

2018-12-06 Thread Evgeniy Efimkin
Hello!

Thank you for questions!

> I've got few questions:
> 1. How will the subscription work for inherited tables? Do we need tests for 
> that?
For subscription created with `FOR TABLE` we can't support inherit tables 
because subscriber don't know anything about inherit. In new patch i remove 
`ONLY` for `FOR TABLE` in subscription related statements
> 2. ALTER PUBLICATION has ADD\DROP and SET. Should we add SET too? Or is there 
> a reason not to do that?
Added it in new patch
> 3. Message "Must be superuser to create FOR ALL TABLES subscriptions" seems a 
> bit strange to me. Also, this literal is embedded into translations. I do not 
> know how we deal with it, how do we deal for example with "måste vara 
> superanvändare för att skapa prenumerationer" or "для создания подписок нужно 
> быть суперпользователем"? Where do we insert FOR ALL TABLES?
I add hint `Use CREATE SUBSCRIPTION ... FOR TABLE ...`
> 4. How does default behavior differ from FOR ALL TABLES?
The same with default implementation
> 5. Can we alter subscription FOR ALL TABLES? Drop some tables out of the 
> subscription?
For subscriptions created with `FOR ALL TABLES` (default), you can't change 
subscribed tables by `ALTER SUBSCRIPTION ADD/DROP` table, you should use `ALTER 
SUBSCRIPTION REFRESH PUBLICATION`

And i don't know how do export for user created subscriptions, because now non 
superuser can't select subconninfo column

03.12.2018, 09:06, "Andrey Borodin" :
> Hi, Evgeniy!
>
> Thanks for working on the feature.
>>  28 нояб. 2018 г., в 21:41, Evgeniy Efimkin  
>> написал(а):
>>
>>  Hello!
>>  I wrote some tests(it's just 01_rep_changes.pl but for non superuser) and 
>> fix `DROP TABLE` from subscription. Now old and new tests pass.
>>
>>  22.11.2018, 16:23, "Evgeniy Efimkin" :
>>>  Hello!
>>>  New draft attached with filtering table in subscription (ADD/DROP) and 
>>> allow non-superusers use` CREATE SUBSCRIPTION` for own tables.
>
> I've looked into the patch. The code looks good and coherent to nearby code.
> There are no docs, obviously, there is WiP.
>
> I've got few questions:
> 1. How will the subscription work for inherited tables? Do we need tests for 
> that?
> 2. ALTER PUBLICATION has ADD\DROP and SET. Should we add SET too? Or is there 
> a reason not to do that?
> 3. Message "Must be superuser to create FOR ALL TABLES subscriptions" seems a 
> bit strange to me. Also, this literal is embedded into translations. I do not 
> know how we deal with it, how do we deal for example with "måste vara 
> superanvändare för att skapa prenumerationer" or "для создания подписок нужно 
> быть суперпользователем"? Where do we insert FOR ALL TABLES?
> 4. How does default behavior differ from FOR ALL TABLES?
> 5. Can we alter subscription FOR ALL TABLES? Drop some tables out of the 
> subscription?
>
> Best regards, Andrey Borodin.

 
Ефимкин Евгений
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index e136aa6a0b..0782dd40f0 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -38,6 +38,7 @@
 #include "utils/syscache.h"
 
 
+
 static List *textarray_to_stringlist(ArrayType *textarray);
 
 /*
@@ -70,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->name = pstrdup(NameStr(subform->subname));
 	sub->owner = subform->subowner;
 	sub->enabled = subform->subenabled;
+	sub->alltables = subform->suballtables;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9021463a4c..58f71a227c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 
+#include "commands/dbcommands.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/subscriptioncmds.h"
@@ -322,6 +323,22 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	char		originname[NAMEDATALEN];
 	bool		create_slot;
 	List	   *publications;
+	AclResult	aclresult;
+	bool 		alltables;
+
+	alltables = !stmt->tables;
+	/* FOR ALL TABLES requires superuser */
+	if (alltables && !superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to create subscriptions"),
+ errhint("Use CREATE SUBSCRIPTION ... FOR TABLE ...";
+
+	/* must have CREATE privilege on database */
+	aclresult = pg_database_aclcheck(MyDatabaseId, GetUserId(), ACL_CREATE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_DATABASE,
+	   get_database_name(MyDatabaseId));
 
 	/*
 	 * Parse and check options.
@@ -342,11 +359,6 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	if (create_slot)
 		PreventInTransactionBlock(isTopLevel, "CREATE SUBSCRIPTION ... WITH (create_slot = true)");
 
-	if (!superuser())
-		

Re: proposal: plpgsql pragma statement

2018-12-06 Thread Jonah H. Harris
IIRC, PRAGMA in Ada was compile-time only. How would you foresee it
affecting runtime?

On Thu, Dec 6, 2018 at 11:27 AM Robert Haas  wrote:

> On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule 
> wrote:
> > I wrote plpgsql_check https://github.com/okbob/plpgsql_check.
> >
> > It is working well, but because it does static analyse only, sometimes
> it can produces false alarms or it should to stop a analyse, because there
> are not necessary data.
> >
> > https://github.com/okbob/plpgsql_check/issues/36
> >
> > I see one possible solution in introduction of pragma statement with
> syntax:
> >
> >   PRAGMA keyword [content to semicolon];
> >
> > The pragma has a relation to following statement. So the issue 36 can be
> solved by pragma
> >
> > PRAGMA cmdtype CREATE;
> > EXECUTE format('CREATE TABLE xxx ...
> >
> > The PRAGMA statement does nothing in runtime. It works only in compile
> time, and add a pair of key, value to next non pragma statement. This
> information can be used by some plpgsql extensions.
> >
> > What do you think about this proposal?
>
> I think it's commandeering PRAGMA for a fairly narrow purpose.  It's
> hardly unimaginable that someone in future might want a PRAGMA that
> does change runtime behavior, or that affects something other than the
> statement which immediately follows.
>
> I don't see a big problem allowing some kind of annotation that
> plpgsql_check can easily access, and I don't even mind it being called
> PRAGMA.  But I don't think it should foreclose unrelated uses of
> PRAGMA which somebody might want to introduce in the future.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
Jonah H. Harris


Re: slow queries over information schema.tables

2018-12-06 Thread Tom Lane
Robert Haas  writes:
> I'm not thrilled about depending on sinval without locking,
> particularly given that my proposal to make sure we
> AcceptInvalidationMessages() at least once per query was shouted down.

It's fairly hard to imagine practical cases where we'd not call
AcceptInvalidationMessages at least once per query, so I'm not
very sure what you're on about.

regards, tom lane



Re: proposal: plpgsql pragma statement

2018-12-06 Thread Robert Haas
On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule  wrote:
> I wrote plpgsql_check https://github.com/okbob/plpgsql_check.
>
> It is working well, but because it does static analyse only, sometimes it can 
> produces false alarms or it should to stop a analyse, because there are not 
> necessary data.
>
> https://github.com/okbob/plpgsql_check/issues/36
>
> I see one possible solution in introduction of pragma statement with syntax:
>
>   PRAGMA keyword [content to semicolon];
>
> The pragma has a relation to following statement. So the issue 36 can be 
> solved by pragma
>
> PRAGMA cmdtype CREATE;
> EXECUTE format('CREATE TABLE xxx ...
>
> The PRAGMA statement does nothing in runtime. It works only in compile time, 
> and add a pair of key, value to next non pragma statement. This information 
> can be used by some plpgsql extensions.
>
> What do you think about this proposal?

I think it's commandeering PRAGMA for a fairly narrow purpose.  It's
hardly unimaginable that someone in future might want a PRAGMA that
does change runtime behavior, or that affects something other than the
statement which immediately follows.

I don't see a big problem allowing some kind of annotation that
plpgsql_check can easily access, and I don't even mind it being called
PRAGMA.  But I don't think it should foreclose unrelated uses of
PRAGMA which somebody might want to introduce in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: psql display of foreign keys

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Robert Haas wrote:

> On Wed, Dec 5, 2018 at 11:30 AM Alvaro Herrera  
> wrote:
> > I added this patch to commitfest in order to get more opinions,
> > particularly on whether to backpatch this.  I might commit sooner than
> > that if others care to comment.
> 
> I don't think this is a bug fix, and therefore I oppose back-patching it.

OK.

That means I can just get pg_partition_root() done first, and try to
write the queries using that instead of WITH RECURSIVE.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: psql display of foreign keys

2018-12-06 Thread Robert Haas
On Wed, Dec 5, 2018 at 11:30 AM Alvaro Herrera  wrote:
> I added this patch to commitfest in order to get more opinions,
> particularly on whether to backpatch this.  I might commit sooner than
> that if others care to comment.

I don't think this is a bug fix, and therefore I oppose back-patching it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: slow queries over information schema.tables

2018-12-06 Thread Robert Haas
On Wed, Dec 5, 2018 at 1:41 PM Tom Lane  wrote:
> Ah, yes, that is a case where we might have enough knowledge to prove
> the test redundant --- but considering that we explicitly discourage
> domain NOT NULL as bad style and not fully supported, I can't get
> excited about it.  I suppose in some cases we might be able to use
> predtest.c to prove domain CHECK constraints redundant, but I bet that
> it's not worth the trouble.
>
> The bigger picture here is that people seem to like to use domains
> as simple type aliases, which will never have any constraints, but
> we're very dumb about that today.  So the patch as presented seems
> like it has lots of potential applicability, as long as we fix the
> invalidation aspect.

I'm not thrilled about depending on sinval without locking,
particularly given that my proposal to make sure we
AcceptInvalidationMessages() at least once per query was shouted down.
That means that in corner cases, you could execute many queries
without flushing the old cache entries.  However, I do agree that
locking every type, function, operator, etc. involved in the query is
impractical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: minor leaks in pg_dump (PG tarball 10.6)

2018-12-06 Thread Stephen Frost
Greetings,

* Pavel Raiskup (prais...@redhat.com) wrote:
> On Wednesday, December 5, 2018 4:59:18 PM CET Stephen Frost wrote:
> > This change doesn't seem to make any sense to me..?  If anything, seems
> > like we'd end up overallocating memory *after* this change, where we
> > don't today (though an analyzer tool might complain because we don't
> > free the memory from it and instead copy the pointer from each of these
> > items into the tbinfo structure).
> 
> Correct, I haven't think that one through.  I was confused that some items
> related to the dropped columns could be unreferenced.  But those are
> anyways allocated as a solid block with others (not intended to be ever
> free()'d).  Feel free to ignore that.

Right.

I've pushed the other changes.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Minor typo

2018-12-06 Thread Stephen Frost
Greetings,

* Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> At Tue, 4 Dec 2018 11:37:16 -0500, Stephen Frost  wrote 
> in <20181204163716.gr3...@tamriel.snowman.net>
> > Thanks to everyone for sharing their thoughts here, the goal is simply
> > to try and have the comments as clear as we can for everyone.
> > 
> > Please find attached a larger rewording of the comment in xlogrecord.h,
> > and a minor change to xloginsert.c to clarify that we're removing a hole
> > and not doing compression at that point.
> > 
> > Other thoughts on this..?
> 
> Thank you for the complete rewriting. It makes the description
> clearer at least for me, execpt the following:
> 
> >  * present is BLCKSZ - the length of "hole" bytes.
> 
> Maybe it's because I'm not so accustomed to punctuation marks but
> I was confused for a second because the '-' didn't look to me a
> minus sign, but a dash. If it is not specific to me, a word
> 'minus' seems better or, (BLCKSZ - ) bytes
> is clearer  for me.

I agree- that threw me off a bit also when I was first reading it.  I've
updated that part a bit and pushed it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 17:02 odesílatel Robert Haas 
napsal:

> On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule 
> wrote:
> > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas 
> napsal:
> >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule 
> wrote:
> >> > I have a problem to imagine it. When fill factor will be low, then
> there is high risk of high fragmentation - or there some body should to do
> defragmentation.
> >>
> >> I don't understand this.
> >
> > I don't know if zheap has or has not any tools for elimination
> fragmentation of space of page. But I expect so after some set of updates,
> when record size is mutable, the free space on page should be fragmented.
> Usually, when you have less memory, then fragmentation is faster.
>
> Still not sure I completely understand, but it's true that zheap
> sometimes needs to compact free space on a page.  For example, if
> you've got a page with a 100-byte hole, and somebody updates a tuple
> to make it 2 bytes bigger, you've got to shift that tuple and any that
> precede it backwards to reduce the size of the hole to 98 bytes, so
> that you can fit the new version of the tuple.  If, later, somebody
> shrinks that tuple back to the original size, you've now got 100 bytes
> of free space on the page, but they are fragmented: 98 bytes in the
> "hole," and 2 bytes following the newly-shrunk tuple.  If someone
> tries to insert a 100-byte tuple in that page, we'll need to
> reorganize the page a second time to bring all that free space back
> together in a single chunk.
>
> In my view, and I'm not sure if this is how the code currently works,
> we should have just one routine to do a zheap page reorganization
> which can cope with all possible scenarios.  I imagine that you would
> give it the page is it currently exists plus a "minimum tuple size"
> for one or more tuples on the page (which must not be smaller than the
> current size of that tuple, but could be bigger).  It then reorganizes
> the page so that every tuple for which a minimum size was given
> consumes exactly that amount of space, every other tuple consumes the
> minimum possible amount of space, and the remaining space goes into
> the hole.  So if you call this function with no minimal tuple sizes,
> it does a straight defragmentation; if you give it minimum tuple
> sizes, then it rearranges the page to make it suitable for a pending
> in-place update of those tuples.
>
> Actually, I think Amit and I discussed further refining this by
> splitting the page reorganization function in half.  One half would
> make a plan for where to put each tuple on the page following the
> reorg, but would not actually do anything.  That would be executed
> before entering a critical section, and might fail if the requested
> minimum tuple sizes can't be satisfied.  The other half would take the
> previously-constructed plan as input and perform the reorganization.
> That would be done in the critical section.
>
>
Thank you for reply

Pavel

-- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule  wrote:
> čt 6. 12. 2018 v 16:26 odesílatel Robert Haas  napsal:
>> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule  
>> wrote:
>> > I have a problem to imagine it. When fill factor will be low, then there 
>> > is high risk of high fragmentation - or there some body should to do 
>> > defragmentation.
>>
>> I don't understand this.
>
> I don't know if zheap has or has not any tools for elimination fragmentation 
> of space of page. But I expect so after some set of updates, when record size 
> is mutable, the free space on page should be fragmented. Usually, when you 
> have less memory, then fragmentation is faster.

Still not sure I completely understand, but it's true that zheap
sometimes needs to compact free space on a page.  For example, if
you've got a page with a 100-byte hole, and somebody updates a tuple
to make it 2 bytes bigger, you've got to shift that tuple and any that
precede it backwards to reduce the size of the hole to 98 bytes, so
that you can fit the new version of the tuple.  If, later, somebody
shrinks that tuple back to the original size, you've now got 100 bytes
of free space on the page, but they are fragmented: 98 bytes in the
"hole," and 2 bytes following the newly-shrunk tuple.  If someone
tries to insert a 100-byte tuple in that page, we'll need to
reorganize the page a second time to bring all that free space back
together in a single chunk.

In my view, and I'm not sure if this is how the code currently works,
we should have just one routine to do a zheap page reorganization
which can cope with all possible scenarios.  I imagine that you would
give it the page is it currently exists plus a "minimum tuple size"
for one or more tuples on the page (which must not be smaller than the
current size of that tuple, but could be bigger).  It then reorganizes
the page so that every tuple for which a minimum size was given
consumes exactly that amount of space, every other tuple consumes the
minimum possible amount of space, and the remaining space goes into
the hole.  So if you call this function with no minimal tuple sizes,
it does a straight defragmentation; if you give it minimum tuple
sizes, then it rearranges the page to make it suitable for a pending
in-place update of those tuples.

Actually, I think Amit and I discussed further refining this by
splitting the page reorganization function in half.  One half would
make a plan for where to put each tuple on the page following the
reorg, but would not actually do anything.  That would be executed
before entering a critical section, and might fail if the requested
minimum tuple sizes can't be satisfied.  The other half would take the
previously-constructed plan as input and perform the reorganization.
That would be done in the critical section.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 16:26 odesílatel Robert Haas 
napsal:

> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule 
> wrote:
> > I have a problem to imagine it. When fill factor will be low, then there
> is high risk of high fragmentation - or there some body should to do
> defragmentation.
>
> I don't understand this.
>

I don't know if zheap has or has not any tools for elimination
fragmentation of space of page. But I expect so after some set of updates,
when record size is mutable, the free space on page should be fragmented.
Usually, when you have less memory, then fragmentation is faster.

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: COPY FROM WHEN condition

2018-12-06 Thread Robert Haas
On Wed, Nov 28, 2018 at 6:17 PM Tomas Vondra
 wrote:
> > Comparing with overhead of setting snapshot before evaluating every row
> > and considering this
> >
> > kind of usage is not frequent it seems to me the behavior is acceptable
>
> I'm not really buying the argument that this behavior is acceptable
> simply because using the feature like this will be uncommon. That seems
> like a rather weak reason to accept it.
>
> I however agree we don't want to make COPY less efficient, at least not
> unless absolutely necessary. But I think we can handle this simply by
> restricting what's allowed to appear the FILTER clause.
>
> It should be fine to allow IMMUTABLE and STABLE functions, but not
> VOLATILE ones. That should fix the example I shared, because f() would
> not be allowed.

I don't think that's a very good solution.  It's perfectly sensible
for someone to want to do WHERE/FILTER random() < 0.01 to load a
smattering of rows, and this would rule that out for no very good
reason.

I think it would be fine to just document that if the filter condition
examines the state of the database, it will not see the results of the
COPY which is in progress.  I'm pretty sure there are other cases -
for example with triggers - where you can get code to run that can't
see the results of the command currently in progress, so I don't
really buy the idea that having a feature which works that way is
categorically unacceptable.

I agree that we can't justify flagrantly wrong behavior on the grounds
that correct behavior is expensive or on the grounds that the
incorrect cases will be rare. However, when there's more than one
sensible behavior, it's not unreasonable to pick one that is easier to
implement or cheaper or whatever, and that's the category into which I
would put this decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule  wrote:
> I have a problem to imagine it. When fill factor will be low, then there is 
> high risk of high fragmentation - or there some body should to do 
> defragmentation.

I don't understand this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 16:12 odesílatel Robert Haas 
napsal:

> On Thu, Dec 6, 2018 at 2:11 AM Pavel Stehule 
> wrote:
> >> > I am sorry, I know zero about zheap - does zheap use fill factor? if
> yes, why?
> >>
> >> Good question.  It is required because tuples can expand (Update tuple
> >> to bigger length).  In such cases, we try to perform in-place update
> >> if there is a space in the page.  So, having fillfactor can help.
> >
> > Thank you for reply :)
>
> I suspect fillfactor is *more* likely to help with zheap than with the
> current heap.  With the current heap, you need to leave enough space
> to store entire copies of the tuples to try to get HOT updates.  But
> with zheap you only need enough room for the anticipate growth in the
> tuples.
>
> For instance, let's say that you plan to update 30% of the tuples in a
> table and make them 1 byte larger.  With the heap, you'd need to leave
> ~ 3/13 = 23% of each page empty, plus a little bit more to allow for
> the storage growth.  So to make all of those updates HOT, you would
> probably need a fillfactor of roughly 75%.  Unfortunately, that will
> make your table larger by one-third, which is terrible.
>
> On the other hand, with zheap, you only need to leave enough room for
> the increased amount of tuple data.  If you've got 121 items per page,
> as in Mithun's statistics, that means you need 121 bytes of free space
> to do all the updates in place.  That means you need a fillfactor of 1
> - (121/8192) = ~98%.  To be conservative you can set a fillfactor of
> say 95%.  Your table will only get slightly bigger, and all of your
> updates will be in place, and everything will be great.  At least with
> respect to fillfactor -- zheap is not free of other problems.
>

I have a problem to imagine it. When fill factor will be low, then there is
high risk of high fragmentation - or there some body should to do
defragmentation.


> Of course, you don't really set fillfactor based on an expectation of
> a single round of tuple updates, but imagine that the workload goes on
> for a while, with tuples getting bigger and smaller again as the exact
> values being stored change.  In a heap table, you need LOTS of empty
> space on each page to get HOT updates.  In a zheap table, you need
> very little, because the updates are in place.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: \gexec \watch

2018-12-06 Thread David Fetter
On Thu, Dec 06, 2018 at 05:01:26AM -0300, Alvaro Herrera wrote:
> On 2018-Dec-06, David Fetter wrote:
> 
> > There's a bit of a philosophical issue here, or a mathematical one,
> > whichever way you want to put it.  Does it actually make sense to have
> > the behavior of one "semicolon" spill onto another?
> 
> Honestly, I don't see the mathematicality in this.  It either works, or
> it doesn't -- and from my POV right now it doesn't.  Are you saying we
> need a \gexecwatch for this to work?

We could call it something a little shorter, but yes. The part you
trimmed away had descriptions of why the current behavior is correct.
We don't really have ways to compose \ operators. If we're going to
add composition to the psql language, we should think it through
carefully, not just glom it on for a single special case.

> I can of course solve my problem with a simple python program, but psql
> is so close ...

Agreed!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: zheap: a new storage format for PostgreSQL

2018-12-06 Thread Robert Haas
On Thu, Dec 6, 2018 at 2:11 AM Pavel Stehule  wrote:
>> > I am sorry, I know zero about zheap - does zheap use fill factor? if yes, 
>> > why?
>>
>> Good question.  It is required because tuples can expand (Update tuple
>> to bigger length).  In such cases, we try to perform in-place update
>> if there is a space in the page.  So, having fillfactor can help.
>
> Thank you for reply :)

I suspect fillfactor is *more* likely to help with zheap than with the
current heap.  With the current heap, you need to leave enough space
to store entire copies of the tuples to try to get HOT updates.  But
with zheap you only need enough room for the anticipate growth in the
tuples.

For instance, let's say that you plan to update 30% of the tuples in a
table and make them 1 byte larger.  With the heap, you'd need to leave
~ 3/13 = 23% of each page empty, plus a little bit more to allow for
the storage growth.  So to make all of those updates HOT, you would
probably need a fillfactor of roughly 75%.  Unfortunately, that will
make your table larger by one-third, which is terrible.

On the other hand, with zheap, you only need to leave enough room for
the increased amount of tuple data.  If you've got 121 items per page,
as in Mithun's statistics, that means you need 121 bytes of free space
to do all the updates in place.  That means you need a fillfactor of 1
- (121/8192) = ~98%.  To be conservative you can set a fillfactor of
say 95%.  Your table will only get slightly bigger, and all of your
updates will be in place, and everything will be great.  At least with
respect to fillfactor -- zheap is not free of other problems.

Of course, you don't really set fillfactor based on an expectation of
a single round of tuple updates, but imagine that the workload goes on
for a while, with tuples getting bigger and smaller again as the exact
values being stored change.  In a heap table, you need LOTS of empty
space on each page to get HOT updates.  In a zheap table, you need
very little, because the updates are in place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Mutability of domain CHECK constraints

2018-12-06 Thread Tom Lane
ALTER DOMAIN ADD CONSTRAINT goes to some effort to verify that existing
stored data of the domain type meets the new constraint.  (It's not
bulletproof, because it can't see uncommitted data, but at least it
tries.)  However, what if the user tries to change the behavior of
an existing constraint clause?  Nothing, of course, since we have
no idea that anything has changed.

This issue occurred to me while thinking about this buglet:

regression=# create function sqlcheck(int) returns bool as
regression-# 'select $1 > 0' language sql;
CREATE FUNCTION
regression=# create domain checkedint as int check(sqlcheck(value));
CREATE DOMAIN
regression=# select 1::checkedint;  -- ok
 checkedint 

  1
(1 row)

regression=# select 0::checkedint;  -- fail
ERROR:  value for domain checkedint violates check constraint "checkedint_check"
regression=# create or replace function sqlcheck(int) returns bool as
'select $1 <= 0' language sql;
CREATE FUNCTION
regression=# select 1::checkedint;  -- fail?
 checkedint 

  1
(1 row)

regression=# select 0::checkedint;  -- ok?  
ERROR:  value for domain checkedint violates check constraint "checkedint_check"

The reason this isn't behaving as-expected is that typcache.c has cached a
version of the domain's check constraint that sqlcheck() has been inlined
into, so the old behavior continues to apply until something happens to
cause the typcache entry to be flushed.

I'd started to work on some code changes to make the typcache react more
promptly, but then it occurred to me that the example is really dubious
anyway because any stored data of the domain type won't be rechecked.
And fixing *that* seems entirely impractical.

So what I'm thinking we should do is document that the behavior of a
domain CHECK constraint is expected to be immutable, and it's on the
user's head to preserve consistency if it isn't.  We could recommend
that any attempt to change a constraint's behavior be implemented by
dropping and re-adding the constraint, which is a case that the system
does know what to do with.

Actually, the same goes for table CHECK constraints ...

Thoughts?

regards, tom lane



Re: \gexec \watch

2018-12-06 Thread Oleksii Kliukin


Hi Álvaro,

> On 6. Dec 2018, at 13:56, Alvaro Herrera  wrote:
> 
> To Oleksii's question, I think if you want to repeat the first query
> over and over, you'd use something like this:
> 
> select format('select now() as execution_time, %L as generation_time', now()) 
> as query \gset
> :query \watch

Nice one, although it only works if the original query outputs a single row 
(because of \gset).
I do agree it’s not that useful to reuse the original query instead of 
executing it anew each time.

Cheers,
Oleksii


Re: \gexec \watch

2018-12-06 Thread Daniel Verite
Alvaro Herrera wrote:

> Hmm, thanks.  AFAICS your hack reexecutes the initial query over and
> over, instead of obtaining a fresh query each time.

I see. That hack is about injecting something programmatically
into the query buffer, but it seems you'd need to do that in a loop.
And if psql had a loop construct you wouldn't need a hack in the
first place I guess!

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: \gexec \watch

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Daniel Verite wrote:

> I think you could achieve more or less the result with a pre-gexec
> hack like that:

Hmm, thanks.  AFAICS your hack reexecutes the initial query over and
over, instead of obtaining a fresh query each time.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: \gexec \watch

2018-12-06 Thread Daniel Verite
Alvaro Herrera wrote:

> Honestly, I don't see the mathematicality in this.  It either works, or
> it doesn't -- and from my POV right now it doesn't.  Are you saying we
> need a \gexecwatch for this to work?
> 
> I can of course solve my problem with a simple python program, but psql
> is so close ...

\watch reexecutes what's in the query buffer, and \gexec does not
write into the query buffer, so the desired piping does not happen
by design.

I think you could achieve more or less the result with a pre-gexec
hack like that:

postgres=# \pset tuples_only on
postgres=# select 'select now();' \g /tmp/file.sql
postgres=# \setenv EDITOR touch
postgres=# \e /tmp/file.sql
 2018-12-06 13:54:24.915752+01

postgres=# \watch
 2018-12-06 13:54:42.366559+01

 2018-12-06 13:54:44.368962+01

 2018-12-06 13:54:46.3713+01



The  "\setenv EDITOR touch" kludge is meant to force \e to
inject the contents of /tmp/file.sql into the query buffer.
It's needed because "\e file" actually checks whether the file has
been modified (per mtime) after $EDITOR returns, and discards it
if it hasn't.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: \gexec \watch

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 13:56 odesílatel Alvaro Herrera 
napsal:

> On 2018-Dec-06, Pavel Stehule wrote:
>
> > čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin 
> > napsal:
>
> > > The other question is whether such a command would execute the original
> > > query every time watch is invoked. Consider, e.g. the following one:
> > >
> > > select format('select now() as execution_time, %L as generation_time',
> > > now()) \gexec
> > > execution_time  | 2018-12-06 12:15:24.136086+01
> > > generation_time | 2018-12-06 12:15:24.13577+01
> > >
> > > If we make \gexec + \watch combination re-execute only the output of
> the
> > > original query (without the query itself), then the generation time
> column
> > > will stay constant through all \watch invocations.
> >
> > It is better to introduce new command like \gexec_repeat with units like
> > 5s, or how much 5x -
>
> It is?  \gexec \watch is an elegant construct using two existing atoms
> with well-defined semantics.  Can't say I see that in \gexec_repeat --
> it seems non-orthogonal to me.
>

Maybe I am wrong, but currently is not possible to compose \ commands.

So you should to introduce new pattern. There is enough long command buffer
to implement it.

Regards

Pavel


> To Oleksii's question, I think if you want to repeat the first query
> over and over, you'd use something like this:
>
> select format('select now() as execution_time, %L as generation_time',
> now()) as query \gset
> :query \watch
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: \gexec \watch

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Pavel Stehule wrote:

> čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin 
> napsal:

> > The other question is whether such a command would execute the original
> > query every time watch is invoked. Consider, e.g. the following one:
> >
> > select format('select now() as execution_time, %L as generation_time',
> > now()) \gexec
> > execution_time  | 2018-12-06 12:15:24.136086+01
> > generation_time | 2018-12-06 12:15:24.13577+01
> >
> > If we make \gexec + \watch combination re-execute only the output of the
> > original query (without the query itself), then the generation time column
> > will stay constant through all \watch invocations.
> 
> It is better to introduce new command like \gexec_repeat with units like
> 5s, or how much 5x -

It is?  \gexec \watch is an elegant construct using two existing atoms
with well-defined semantics.  Can't say I see that in \gexec_repeat --
it seems non-orthogonal to me.

To Oleksii's question, I think if you want to repeat the first query
over and over, you'd use something like this:

select format('select now() as execution_time, %L as generation_time', now()) 
as query \gset
:query \watch

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: \gexec \watch

2018-12-06 Thread Alvaro Herrera
Hi Oleksii

On 2018-Dec-06, Oleksii Kliukin wrote:

> The other question is whether such a command would execute the
> original query every time watch is invoked. Consider, e.g. the
> following one:
> 
> select format('select now() as execution_time, %L as generation_time', now()) 
> \gexec
> execution_time  | 2018-12-06 12:15:24.136086+01
> generation_time | 2018-12-06 12:15:24.13577+01
> 
> If we make \gexec + \watch combination re-execute only the output of
> the original query (without the query itself), then the generation
> time column will stay constant through all \watch invocations.

Hmm, I think reusing the first query is not terribly useful.  My
example (thus far) is something like this

select format('select tableoid::regclass, * from %s where ctid = ''(%s,%s)''', 
relation::regclass, page, tuple) 
  from pg_locks 
 where locktype = 'tuple' and 
   pid in (select pid from pg_locks where granted = false and locktype = 
'transactionid') and
   database = (select oid from pg_database where datname = 
current_database())  
\gexec [\watch]

which is supposed to report the current tuple-level conflicts (two
updates concurrently in the same tuple, etc).  I want to get the
PK/replica identity[*] of all tuples that some backend is currently
waiting for; if the query remains constant, it will return me the
identity of the tuple located in the CTID of the tuples that conflicted
in the first iteration, which is completely useless.

[*] Right now it just reports all columns rather than PK ... I intend to
add that bit next.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Hint and detail punctuation

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, Daniel Gustafsson wrote:

> > On 6 Dec 2018, at 05:46, Michael Paquier  wrote:
> > 
> > On Wed, Dec 05, 2018 at 05:22:25PM +0100, Daniel Gustafsson wrote:
> >> While looking at error messages downstream, I noticed a few hints and 
> >> details
> >> in postgres which aren’t punctuated as per the style guide.  The attached 
> >> patch
> >> fixes the ones where it seemed reasonable to end with a period.
> > 
> > Good point.  I am spotting a couple more places:
> > src/backend/utils/misc/guc.c:
> 
> Ah, for some reason I hadn’t thought about looking at the GUCs.  I agree with
> your findings, patch updated.

LGTM.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: \gexec \watch

2018-12-06 Thread Pavel Stehule
čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin 
napsal:

>
>
> > On 6. Dec 2018, at 09:01, Alvaro Herrera 
> wrote:
> >
> > On 2018-Dec-06, David Fetter wrote:
> >
> >> There's a bit of a philosophical issue here, or a mathematical one,
> >> whichever way you want to put it.  Does it actually make sense to have
> >> the behavior of one "semicolon" spill onto another?
> >
> > Honestly, I don't see the mathematicality in this.  It either works, or
> > it doesn't -- and from my POV right now it doesn't.  Are you saying we
> > need a \gexecwatch for this to work?
>
> I’ve been trying to do similar stuff with periodic execution of \gexec
> (changing the tablespace of all tables in the given one and retrying, since
> some of them could only get a lock on subsequent attempts)  and generally
> reverted to a  bash loop outside of psql, but having it built-in would be
> great.
>
> Perhaps a numeric argument to \gexec, i.e. \gexec 5 to re-execute the
> output of a query every 5 seconds?
>

looks not intuitive :)


> The other question is whether such a command would execute the original
> query every time watch is invoked. Consider, e.g. the following one:
>
> select format('select now() as execution_time, %L as generation_time',
> now()) \gexec
> execution_time  | 2018-12-06 12:15:24.136086+01
> generation_time | 2018-12-06 12:15:24.13577+01
>
> If we make \gexec + \watch combination re-execute only the output of the
> original query (without the query itself), then the generation time column
> will stay constant through all \watch invocations.
>

It is better to introduce new command like \gexec_repeat with units like
5s, or how much 5x -

Regards

Pavel



>
> Cheers,
> Oleksii
>


Re: Hint and detail punctuation

2018-12-06 Thread Michael Paquier
On Thu, Dec 06, 2018 at 09:42:26AM +0100, Daniel Gustafsson wrote:
> Ah, for some reason I hadn’t thought about looking at the GUCs.  I agree with
> your findings, patch updated.

Thanks Daniel, that looks fine to me at quick glance.  I'll try to get
that committed tomorrow my time if there are no objections until then.
--
Michael


signature.asc
Description: PGP signature


Re: \gexec \watch

2018-12-06 Thread Oleksii Kliukin



> On 6. Dec 2018, at 09:01, Alvaro Herrera  wrote:
> 
> On 2018-Dec-06, David Fetter wrote:
> 
>> There's a bit of a philosophical issue here, or a mathematical one,
>> whichever way you want to put it.  Does it actually make sense to have
>> the behavior of one "semicolon" spill onto another?
> 
> Honestly, I don't see the mathematicality in this.  It either works, or
> it doesn't -- and from my POV right now it doesn't.  Are you saying we
> need a \gexecwatch for this to work?

I’ve been trying to do similar stuff with periodic execution of \gexec 
(changing the tablespace of all tables in the given one and retrying, since 
some of them could only get a lock on subsequent attempts)  and generally 
reverted to a  bash loop outside of psql, but having it built-in would be great.

Perhaps a numeric argument to \gexec, i.e. \gexec 5 to re-execute the output of 
a query every 5 seconds?

The other question is whether such a command would execute the original query 
every time watch is invoked. Consider, e.g. the following one:

select format('select now() as execution_time, %L as generation_time', now()) 
\gexec
execution_time  | 2018-12-06 12:15:24.136086+01
generation_time | 2018-12-06 12:15:24.13577+01

If we make \gexec + \watch combination re-execute only the output of the 
original query (without the query itself), then the generation time column will 
stay constant through all \watch invocations.

Cheers,
Oleksii


Re: Hint and detail punctuation

2018-12-06 Thread Daniel Gustafsson
> On 6 Dec 2018, at 05:46, Michael Paquier  wrote:
> 
> On Wed, Dec 05, 2018 at 05:22:25PM +0100, Daniel Gustafsson wrote:
>> While looking at error messages downstream, I noticed a few hints and details
>> in postgres which aren’t punctuated as per the style guide.  The attached 
>> patch
>> fixes the ones where it seemed reasonable to end with a period.
> 
> Good point.  I am spotting a couple more places:
> src/backend/utils/misc/guc.c:

Ah, for some reason I hadn’t thought about looking at the GUCs.  I agree with
your findings, patch updated.

cheers ./daniel



errhint_punctuation-v2.patch
Description: Binary data


Re: \gexec \watch

2018-12-06 Thread Alvaro Herrera
On 2018-Dec-06, David Fetter wrote:

> There's a bit of a philosophical issue here, or a mathematical one,
> whichever way you want to put it.  Does it actually make sense to have
> the behavior of one "semicolon" spill onto another?

Honestly, I don't see the mathematicality in this.  It either works, or
it doesn't -- and from my POV right now it doesn't.  Are you saying we
need a \gexecwatch for this to work?

I can of course solve my problem with a simple python program, but psql
is so close ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services