Re: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)
On Wed, Nov 21, 2018 at 10:05:34AM -0500, Robert Haas wrote: > On Tue, Nov 20, 2018 at 1:21 AM Takahashi, Ryohei > wrote: > > My customer uses PostgreSQL on Windows and hits the problem that following > > log is written to the server logs too frequently (250 thousand times per > > day). > > "LOG: could not reserve shared memory region (addr=%p) for child %p:" > > According to the comment of internal_forkexec(), > > pgwin32_ReserveSharedMemoryRegion() sometimes fails if ASLR is active. If > > so, I think administrators are not interested in this log since it is a > > normal event. Windows ASLR can't explain your system's behavior. > You might be right, but maybe we should first try to understand why > this is happening so frequently. Maybe it's not that normal. When PostgreSQL installed that workaround, we weren't aware of any system where this message was common enough to make the log volume a problem. I agree the message is harmful on your system, but I'm not inclined to change it to DEBUG1 for the benefit of one system. Can you adopt pgbouncer, to reduce the frequency of starting new backend processes? That should improve your performance, too. Could you collect the information http://postgr.es/m/20181203053506.gb2860...@rfd.leadboat.com requests? That may help us understand your system's unusual behavior. (The issue in that thread is related but not identical.)
Re: Should new partitions inherit their tablespace from their parent?
On Sat, Nov 24, 2018 at 10:18:17AM +0900, Michael Paquier wrote: > On Sat, Nov 24, 2018 at 12:32:36PM +1300, David Rowley wrote: >> On Fri, 23 Nov 2018 at 17:03, David Rowley >> wrote: >> > Here's a patch for that. Parking here until January's commitfest. >> >> And another, now rebased atop of de38ce1b8 (which I probably should >> have waited for). Thanks for the patch, David. I can see that this patch makes the code more consistent for partitioned tables and partitioned indexes when it comes to tablespace handling, which is a very good thing. -ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace) +ATExecSetTableSpaceNoStorage(Relation rel, Oid newTableSpace) NoStorage looks strange as routine name for this case. Would something like ATExecPartedRelSetTableSpace be more adapted perhaps? + else if (stmt->partbound) + { + RangeVar *parent; + Relationparentrel; + + /* +* For partitions, when no other tablespace is specified, we default +* the tablespace to the parent partitioned table's. +*/ Okay, so the direct parent is what counts, and not the top-most parent. Could you add a test with multiple level of partitioned tables, like: - parent is in tablespace 1. - child is created in tablespace 2. - grandchild is created, which should inherit tablespace 2 from the child, but not tablespace 1 from the parent. In the existing example, as one partition is used to test the top-most parent and another for the new default, it looks cleaner to create a third partition which would be itself a partitioned table. -- Michael signature.asc Description: PGP signature
Re: Support custom socket directory in pg_upgrade
On Sat, Nov 17, 2018 at 10:15:08PM +0100, Daniel Gustafsson wrote: > > On 15 Nov 2018, at 22:42, Tom Lane wrote: > > > Further point about that: pg_regress's method of creating a temp > > directory under /tmp is secure only on machines with the stickybit > > set on /tmp; otherwise it's possible for an attacker to rename the > > temp dir out of the way and inject his own socket. We agreed that > > that was an okay risk to take for testing purposes, but I'm much > > less willing to assume that it's okay for production use with > > pg_upgrade. > > That’s a good point, it’s not an assumption I’d be comfortable with when it > deals with system upgrades. As in https://postgr.es/m/flat/20140329222934.gc170...@tornado.leadboat.com, I maintain that insecure /tmp is not worth worrying about in any part of PostgreSQL.
RE: [Todo item] Add entry creation timestamp column to pg_stat_replication
>I have let this stuff cool down for a couple of days, still it seems to me >that having one single column makes the most sense when it comes when >looking at a mostly idle system. I'll try to finish this one at the >beginning of next week, for now I am marking as ready for committer. Thank you for your attention! Best regards, Myungkyu, Lim
Re: [Todo item] Add entry creation timestamp column to pg_stat_replication
On Tue, Dec 04, 2018 at 04:24:25PM +0900, myungkyu.lim wrote: > I think purpose of this field, > that react interval check and debugging on idle system. > So, merging both is better. I have let this stuff cool down for a couple of days, still it seems to me that having one single column makes the most sense when it comes when looking at a mostly idle system. I'll try to finish this one at the beginning of next week, for now I am marking as ready for committer. -- Michael signature.asc Description: PGP signature
Re: Strange OSX make check-world failure
Samuel Cochran writes: > System Integrity Protection strips dynamic linker (dyld) environment > variables, such as DYLD_LIBRARY_PATH, during exec(2) [1] Yeah. I wish Apple would just fix that silliness ... I'll spare you the rant about why it's stupid, but it is. (BTW, last I looked, it's not exec(2) per se that's doing the damage; the problem is that we're invoking a sub-shell that's considered a protected program for some reason, and it's only the use of that that causes DYLD_LIBRARY_PATH to get removed from the process environment.) > so we need to rewrite the load paths inside binaries when relocating then > during make temp-install before make check on darwin. Interesting proposal, but I think it needs work. * As coded, this only fixes the problem for references to libpq, not any of our other shared libraries. * It's also unpleasant that it hard-wires knowledge of libpq's version numbering in a place pretty far removed from anywhere that should know that. * Just to be annoying, this won't work at all on 32-bit OSX versions unless we link everything with -headerpad_max_install_names. (I know Apple forgot about 32-bit machines long ago, but our buildfarm hasn't.) * Speaking of not working, I don't think this "find" invocation will report any failure exits from install_name_tool. * This doesn't fix anything for executables that never get installed, for instance isolationtester. We could probably fix the first four problems with some more sweat, but I'm not seeing a plausible answer to the last one. Overwriting isolationtester's rpath to make "make check" work would just break it for "make installcheck". regards, tom lane
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
On Thu, Dec 6, 2018 at 1:27 PM Alvaro Herrera wrote: > On 2018-Dec-06, Amit Langote wrote: > > > The partitionwise join related > > changes in PG 11 moved the add_child_rel_equivalences call in > > set_append_rel_size such that child EC members would be added even before > > checking if the child rel is dummy, but for a reason named in the comment > > above the call: > > > >... Even if this child is > > * deemed dummy, it may fall on nullable side in a child-join, which > > * in turn may participate in a MergeAppend, where we will need the > > * EquivalenceClass data structures. > > > > However, I think we can skip adding the dummy child EC members here and > > instead make it a responsibility of partitionwise join code in joinrels.c > > to add the needed EC members. Attached a patch to show what I mean, > which > > passes the tests and gives this planning time: > > Robert, Ashutosh, any comments on this? I'm unfamiliar with the > partitionwise join code. > As the comment says it has to do with the equivalence classes being used during merge append. EC's are used to create pathkeys used for sorting. Creating a sort node which has column on the nullable side of an OUTER join will fail if it doesn't find corresponding equivalence class. You may not notice this if both the partitions being joined are pruned for some reason. Amit's idea to make partition-wise join code do this may work, but will add a similar overhead esp. in N-way partition-wise join once those equivalence classes are added. -- Best Wishes, Ashutosh Bapat
RE: libpq debug log
Hi, > TODO: > I will add the feature called "logging level" on next version patch. I will > attach it as soon as possible. I created v4 patch. Please find attached the patch. This patch developed "logminlevel" parameter. level1 and level2 can be set, level1 is the default. If level1 is set, it outputs the time in the functions. It helps to find out where it takes time. If level2 is set, it outputs information on the protocol being exchanged in addition to level1 information. I would appreciate if you could review my latest patch. Regards, Aya Iwata v4-0001-libpq-trace-log.patch Description: v4-0001-libpq-trace-log.patch
Re: Strange OSX make check-world failure
Hi folks Forgive me if I'm getting the mailing list etiquette wrong — first time poster. I ended up sitting next to Thomas Munro at PGDU 2018 and talking about testing. While trying to get `make check` running on my macbook, I think I may have fixed this issue. System Integrity Protection strips dynamic linker (dyld) environment variables, such as DYLD_LIBRARY_PATH, during exec(2) [1] so we need to rewrite the load paths inside binaries when relocating then during make temp-install before make check on darwin. Homebrew does something similar [2]. I've attached a patch which adjust the Makefile and gets make check working on my machine with SIP in tact. Cheers, Sam [1]: https://developer.apple.com/library/archive/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html [2]: https://github.com/Homebrew/brew/blob/77e6a927504c51a1393a0a6ccaf6f2611ac4a9d5/Library/Homebrew/os/mac/keg.rb#L17-L30 On Tue, Sep 18, 2018, at 8:39 AM, Tom Lane wrote: > Thomas Munro writes: > > On Tue, Sep 18, 2018 at 2:14 AM Tom Lane wrote: > >> "make check" generally won't work on OSX unless you've disabled SIP: > >> https://www.howtogeek.com/230424/how-to-disable-system-integrity-protection-on-a-mac-and-why-you-shouldnt/ > > > Aha! It looks like it was important to run "make install" before > > running those tests. > > Right. If you don't want to disable SIP, you can work around it by always > doing "make install" before "make check". Kind of a PITA though. > > regards, tom lane > > From cdfe53a93453d8cdf12cfaaea13574365fbba66b Mon Sep 17 00:00:00 2001 From: Samuel Cochran Date: Fri, 7 Dec 2018 15:27:30 +1100 Subject: [PATCH] Fix `make check` on darwin System Integrity Protection strips dynamic linker (dyld) environment variables, such as DYLD_LIBRARY_PATH, during exec(2), so we need to rewrite the load paths inside binaries when relocating then during make temp-install before make check on darwin. https://developer.apple.com/library/archive/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 956fd274cd..48f2e2bcc7 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -390,6 +390,10 @@ ifeq ($(MAKELEVEL),0) rm -rf '$(abs_top_builddir)'/tmp_install $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 + # darwin doesn't propagate DYLD_* vars due to system integrity + # protection, so we need to rewrite the load commands inside the + # binaries when relocating them + $(if $(filter $(PORTNAME),darwin),find '$(abs_top_builddir)/tmp_install$(bindir)' -type f -exec install_name_tool -change '$(libdir)/libpq.5.dylib' '$(abs_top_builddir)/tmp_install$(libdir)/libpq.5.dylib' {} \;) endif $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done) endif
RE: Timeout parameters
Hi, There was an invisible space, so I removed it. I registered with 2019-01 commitfest. Best regards, - Ryohei Nagaura > -Original Message- > From: Nagaura, Ryohei [mailto:nagaura.ryo...@jp.fujitsu.com] > Sent: Thursday, December 6, 2018 2:20 PM > To: 'Fabien COELHO' ; > 'pgsql-hack...@postgresql.org' > Cc: Yahorau, A. (IBA) > Subject: RE: Timeout parameters > > Hi, Fabien. > > Thank you for your review. > And I'm very sorry to have kept you waiting so long. > > > About "socket_timeout" > > > I'm generally fine with giving more access to low-level parameters to > users. > > However, I'm not sure I understand the use case you have that needs > > these new extensions. > If you face the following situation, this parameter will be needed. > 1. The connection between the server and the client has been established > normally. > 2. A server process has been received SQL statement. > 3. The server OS can return an ack packet, but it takes time to execute > the SQL statement >Or return the result because the server process is very busy. > 4. The client wants to close the connection while leaving the job to the > server. > In this case, "statement_timeout" can't satisfy at line 4. > > > I think that there is some kind of a misnomer: this is not a > > socket-level timeout, but a client-side query timeout, so it should be > named differently? > Yes, I think so. > > > I'm not sure how to name it, though. > Me too. > > > I think that the way it works is a little extreme, basically the > > connection is aborted from within pqWait, and then restarted from scratch. > > > > There is no clear way to know about the value of the setting (SHOW, > > \set, \pset...). Ok, this is already the case of other connection > parameters. > If this parameter can be needed, I would like to discuss design and optional > functions. > How do you think? > I'll correct patch of "socket_timeout" after that. > > > About "TCP_USER_TIMEOUT" > I fixed on the previous feedback. > Would you review, please? > > > There are no tests. > I introduce the test methods of TCP_USER_TIMEOUT. > > Test of client-side TCP_USER_TIMEOUT: > [client operation] > 1. Connect DB server. > postgres=# psql > postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=1 > 5000 > 2. Get the port number by the following command: > postgres=# select inet_client_port(); > 3. Close the client port from the other console of the client machine. >Please rewrite "56750" to the number confirmed on line 2. > $ iptables -I INPUT -p tcp --dport 56750 -j DROP 4. Query the > following SQL: > postgres=# select pg_sleep(10); > 5. TCP USER TIMEOUT works correctly if an error message is output to the > console. > > Test of server-side TCP_USER_TIMEOUT: > [client operation] > 1. Connect DB server. > 2. Get the port number by the following command: > postgres=# select inet_client_port(); > 3. Set the TCP_USER_TIMEOUT by the following command: > postgres=# set tcp_user_timeout=15000; > 4. Query the following SQL: > postgres=# select pg_sleep(10); > 5. Close the client port from the other console. >Please rewrite "56750" to the number confirmed on line 2. > $ iptables -I INPUT -p tcp --dport 56750 -j DROP [server operation] > 6. Verify the logfile. > > > There is no documentation. > I made a patch of documentation of TCP USER TIMEOUT. > > Best regards, > - > Ryohei Nagaura document_v2.patch Description: document_v2.patch TCP_backend_v2.patch Description: TCP_backend_v2.patch TCP_interface_v2.patch Description: TCP_interface_v2.patch
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On 06.12.2018 11:52, Peter Geoghegan wrote: On Wed, Dec 5, 2018 at 10:35 PM Andrey Lepikhov wrote: This solution changes pg_depend relation for solve a problem, which exists only in regression tests. Very rarely it can be in the partitioning cases. Or is it not? I don't think it's a matter of how rarely this will happen. We're trying to avoid these diagnostic message changes because they are wrong. I don't think that there is much ambiguity about that. Still, it will happen however often the user drops objects belonging to partition children, which could be quite often. I want to say that we need to localize the rules for the order of the diagnostic messages as much as possible in dependency.c. I think this decision is some excessive. May be you consider another approach: 1. Order of dependencies in 'DROP ... CASCADE' case is a problem of test tools, not DBMS. And here we can use 'verbose terse'. 2. Print all dependencies in findDependentObjects() on a drop error (see attachment as a prototype). You didn't include changes to the regression test output, which seems like a big oversight... It was done knowingly to show the differences in messages that introduces this approach. messages that are represented in the regression tests. Anyway, I don't think it's acceptable to change the messages like this. It makes them much less useful. May you clarify this? If I understand correctly, your solution is that user receives a report about the last inserted internal dependency on the object. Why the full info about internal dependencies of the object much less useful? These stability issue keeps coming up, which makes a comprehensive approach seem attractive to me. At least 95% of the test instability comes from pg_depend. During the retail index tuple deletion project (heap cleaner subsystem) we have non-fixed order of tuples at database relations. This caused to unsteady order of text strings in some check-world test results. Thus, I realized that the order of messages in the test results is mostly a game of chance. For this reason I think it is necessary to find more general solution of the messages ordering problem. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Dec 6, 2018 at 10:53 PM John Naylor wrote: > > I've added an additional regression test for finding the right block > and removed a test I thought was redundant. I've kept the test file in > its own schedule. > +# -- +# fsm does a vacuum, and running it in parallel seems to prevent heap truncation. +# -- +test: fsm + It is not clear to me from the comment why running it in parallel prevents heap truncation, can you explain what behavior are you seeing and what makes you think that running it in parallel caused it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: zheap: a new storage format for PostgreSQL
On Thu, Dec 6, 2018 at 9:32 PM Robert Haas wrote: > > On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule wrote: > > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas > > napsal: > >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule > >> wrote: > >> > I have a problem to imagine it. When fill factor will be low, then there > >> > is high risk of high fragmentation - or there some body should to do > >> > defragmentation. > >> > >> I don't understand this. > > > > I don't know if zheap has or has not any tools for elimination > > fragmentation of space of page. But I expect so after some set of updates, > > when record size is mutable, the free space on page should be fragmented. > > Usually, when you have less memory, then fragmentation is faster. > > Still not sure I completely understand, but it's true that zheap > sometimes needs to compact free space on a page. For example, if > you've got a page with a 100-byte hole, and somebody updates a tuple > to make it 2 bytes bigger, you've got to shift that tuple and any that > precede it backwards to reduce the size of the hole to 98 bytes, so > that you can fit the new version of the tuple. If, later, somebody > shrinks that tuple back to the original size, you've now got 100 bytes > of free space on the page, but they are fragmented: 98 bytes in the > "hole," and 2 bytes following the newly-shrunk tuple. If someone > tries to insert a 100-byte tuple in that page, we'll need to > reorganize the page a second time to bring all that free space back > together in a single chunk. > > In my view, and I'm not sure if this is how the code currently works, > we should have just one routine to do a zheap page reorganization > which can cope with all possible scenarios. I imagine that you would > give it the page is it currently exists plus a "minimum tuple size" > for one or more tuples on the page (which must not be smaller than the > current size of that tuple, but could be bigger). It then reorganizes > the page so that every tuple for which a minimum size was given > consumes exactly that amount of space, every other tuple consumes the > minimum possible amount of space, and the remaining space goes into > the hole. So if you call this function with no minimal tuple sizes, > it does a straight defragmentation; if you give it minimum tuple > sizes, then it rearranges the page to make it suitable for a pending > in-place update of those tuples. > Yeah, the code is also along these lines, however, as of now, the API takes input for one tuple (it's offset number and delta space (additional space required by update that updates tuple to a bigger size)). As of now, we don't have a requirement for multiple tuples, but if there is a case, I think the API can be adapted. One more thing we do during repair-fragmentation is to arrange tuples in their offset order so that future sequence scans can be faster. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Add pg_partition_root to get top-most parent of a partition tree
On Thu, Dec 06, 2018 at 10:48:59PM -0300, Alvaro Herrera wrote: > I think adding a pg_partition_root() call to as many pg_partition_tree > tests as you modified is overkill ... OTOH I'd have one test that > invokes pg_partition_tree(pg_partition_root(some-partition)) to verify > that starting from any point in the tree you get the whole tree. Good idea, thanks for the input. > I haven't actually tried to write a query that obtains a tree of > constraints using this, mind ... Sure. It would be good to agree on an interface. I have not tried either, but you should be able to get away with a join on relid returned by pg_partition_tree() with pg_constraint.conrelid with pg_get_constraintdef() instead of a WITH RECURSIVE, no? -- Michael signature.asc Description: PGP signature
Re: Add pg_partition_root to get top-most parent of a partition tree
I think adding a pg_partition_root() call to as many pg_partition_tree tests as you modified is overkill ... OTOH I'd have one test that invokes pg_partition_tree(pg_partition_root(some-partition)) to verify that starting from any point in the tree you get the whole tree. I haven't actually tried to write a query that obtains a tree of constraints using this, mind ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Add pg_partition_root to get top-most parent of a partition tree
Hi all, Álvaro has given faced a use case where it would be useful to have a function which is able to return the top-most parent of a partition tree: https://postgr.es/m/20181204184159.eue3wlchqrkh4vsc@alvherre.pgsql This has been mentioned as well on the thread where was discussed pg_partition_tree, but it got shaved from the committed patch as many things happened when discussing the thing. Attached is a patch to do the work, which includes documentation and tests. An argument could be made to include the top-most parent as part of pg_partition_tree, but it feels more natural to me to have a separate function. This makes sure to handle invalid relations by returning NULL, and it generates an error for incorrect relkind. I have included as well a fix for the recent crash on pg_partition_tree I have reported, but let's discuss the crash on its thread here: https://www.postgresql.org/message-id/20181207010406.go2...@paquier.xyz The bug fix would most likely get committed first, and I'll rebase this patch as need be. I am adding this patch to the CF of January. I think that Amit should also be marked as a co-author of this patch, as that's inspired from what has been submitted previously, still I have no reused the code. Thanks, -- Michael diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b3336ea9be..dbec132188 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20270,6 +20270,17 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); their partitions, and so on. + + +pg_partition_root +pg_partition_root(regclass) + + regclass + +Return the top-most parent of a partition tree for the given +partitioned table or partitioned index. + + diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c index 78dd2b542b..611a33d0e1 100644 --- a/src/backend/utils/adt/partitionfuncs.c +++ b/src/backend/utils/adt/partitionfuncs.c @@ -23,7 +23,38 @@ #include "funcapi.h" #include "utils/fmgrprotos.h" #include "utils/lsyscache.h" +#include "utils/syscache.h" +/* + * Perform several checks on a relation on which is extracted some + * information related to its partition tree. Returns false if the + * relation cannot be processed, in which case it is up to the caller + * to decide what to do instead of an error. + */ +static bool +check_rel_for_partition_info(Oid relid) +{ + char relkind; + + /* Check if relation exists */ + if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid))) + return false; + + relkind = get_rel_relkind(relid); + + /* Only allow relation types that can appear in partition trees. */ + if (relkind != RELKIND_RELATION && + relkind != RELKIND_FOREIGN_TABLE && + relkind != RELKIND_INDEX && + relkind != RELKIND_PARTITIONED_TABLE && + relkind != RELKIND_PARTITIONED_INDEX) + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is not a table, a foreign table, or an index", + get_rel_name(relid; + + return true; +} /* * pg_partition_tree @@ -38,20 +69,11 @@ pg_partition_tree(PG_FUNCTION_ARGS) { #define PG_PARTITION_TREE_COLS 4 Oid rootrelid = PG_GETARG_OID(0); - char relkind = get_rel_relkind(rootrelid); FuncCallContext *funcctx; ListCell **next; - /* Only allow relation types that can appear in partition trees. */ - if (relkind != RELKIND_RELATION && - relkind != RELKIND_FOREIGN_TABLE && - relkind != RELKIND_INDEX && - relkind != RELKIND_PARTITIONED_TABLE && - relkind != RELKIND_PARTITIONED_INDEX) - ereport(ERROR, -(errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is not a table, a foreign table, or an index", - get_rel_name(rootrelid; + if (!check_rel_for_partition_info(rootrelid)) + PG_RETURN_NULL(); /* stuff done only on the first call of the function */ if (SRF_IS_FIRSTCALL()) @@ -152,3 +174,39 @@ pg_partition_tree(PG_FUNCTION_ARGS) /* done when there are no more elements left */ SRF_RETURN_DONE(funcctx); } + +/* + * pg_partition_root + * + * For the given relation part of a partition tree, return its top-most + * root parent. + */ +Datum +pg_partition_root(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + Oid rootrelid; + List *ancestors; + + if (!check_rel_for_partition_info(relid)) + PG_RETURN_NULL(); + + /* + * If the relation is not a partition, return itself as a result. + */ + if (!get_rel_relispartition(relid)) + PG_RETURN_OID(relid); + + /* Fetch the top-most parent */ + ancestors = get_partition_ancestors(relid); + rootrelid = llast_oid(ancestors); + list_free(ancestors); + + /* + * If the relation is actually a partition, 'rootrelid' has been set to + * the OID of the root table in the partition tree. It should be a valid + * valid per the previous check for partition leaf above. + */ + Assert(OidIsValid(rootrelid)); + PG_RETURN_OID(rootrelid); +} diff --git
pg_partition_tree crashes for a non-defined relation
Hi all, While testing another patch, I have bumped into the issue of $subject... I should have put some more negative testing from the start on this stuff, here is a culprit query when passing directly an OID: select pg_partition_tree(0); I think that we should make the function return NULL if the relation defined does not exist, as we usually do for system-facing functions. It is also easier for the caller to know that the relation does not exist instead of having a plpgsql try/catch wrapper or such. Thoughts? -- Michael diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c index 78dd2b542b..8748869511 100644 --- a/src/backend/utils/adt/partitionfuncs.c +++ b/src/backend/utils/adt/partitionfuncs.c @@ -23,6 +23,7 @@ #include "funcapi.h" #include "utils/fmgrprotos.h" #include "utils/lsyscache.h" +#include "utils/syscache.h" /* @@ -42,6 +43,9 @@ pg_partition_tree(PG_FUNCTION_ARGS) FuncCallContext *funcctx; ListCell **next; + if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(rootrelid))) + PG_RETURN_NULL(); + /* Only allow relation types that can appear in partition trees. */ if (relkind != RELKIND_RELATION && relkind != RELKIND_FOREIGN_TABLE && diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out index 6b116125e6..2df48fcc1c 100644 --- a/src/test/regress/expected/partition_info.out +++ b/src/test/regress/expected/partition_info.out @@ -6,6 +6,12 @@ SELECT * FROM pg_partition_tree(NULL); ---+-++--- (0 rows) +SELECT * FROM pg_partition_tree(0); + relid | parentrelid | isleaf | level +---+-++--- + | || +(1 row) + -- Test table partition trees CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a); CREATE TABLE ptif_test0 PARTITION OF ptif_test diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql index 5a76f22b05..9b55a7fe5c 100644 --- a/src/test/regress/sql/partition_info.sql +++ b/src/test/regress/sql/partition_info.sql @@ -2,6 +2,7 @@ -- Tests for pg_partition_tree -- SELECT * FROM pg_partition_tree(NULL); +SELECT * FROM pg_partition_tree(0); -- Test table partition trees CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a); signature.asc Description: PGP signature
RE: speeding up planning with partitions
Hi, Amit Thanks for the quick modification. On Wed, Dec 5, 2018 at 8:26 PM, Amit Langote wrote: > > 1. ... > > 5. > > 0003: line 1620-1621 > > > > + * After creating the RelOptInfo for the given child RT index, it goes on > > to > > + * initialize some of its fields base on the parent RelOptInfo. > > > > s/fields base on/fields based on/ > > Fixed all of 1-5. Thanks for fixing. > > 6. > > parsenodes.h > > 906 *inh is true for relation references that should be expanded to > > include > > 907 *inheritance children, if the rel has any. This *must* be false > > for > > 908 *RTEs other than RTE_RELATION entries. > > > > I think inh can become true now even if RTEKind equals RTE_SUBQUERY, so > > latter > > sentence need to be modified. > > > > Seems like an existing comment bug. Why don't you send a patch as you > discovered it? :) Thanks, I am pleased with your proposal. I'll post it as a small fix of the comment. > > 7. > > 0005: line 109-115 ... > Un-pruned partitions may become dummy due to contradictory constraints or > constraint exclusion using normal CHECK constraints later and whether it's > dummy is checked properly by functions that iterate over live_parts. Ah, I understand partitions are eliminated contradictory constraints or constraint exclusion, both using constraints. > Attached updated patches. I have a few other changes in mind to make to > 0001 such that the range table in each child's version of Query contains > only that child table in place of the original target relation, instead of > *all* child tables which is the current behavior. The current behavior > makes range_table_mutator a big bottleneck when the number of un-pruned > target children is large. But I'm saving it for the next week so that I OK. I will continue the review of 0001 before/after your supplying of next patch with keeping those in mind. > can prepare for the PGConf.ASIA that's starting on Monday next week. See > you there. :) Yeah, see you there. :) -- Yoshikazu Imai
Re: Hint and detail punctuation
On Fri, Dec 07, 2018 at 01:27:52AM +0100, Daniel Gustafsson wrote: > Oh.. sorry about that, I should’ve known better. Thanks for tidying up! No problem. Thanks for the report and for caring on the matter. So do I. -- Michael signature.asc Description: PGP signature
Re: Hint and detail punctuation
> On 6 Dec 2018, at 23:51, Michael Paquier wrote: > > On Thu, Dec 06, 2018 at 09:44:22AM -0300, Alvaro Herrera wrote: >> LGTM. > > Thanks Álvaro for the additional lookup. I have committed the patch, > after updating the regression test output as this stuff was forgotten, > but that was easy enough to fix. Oh.. sorry about that, I should’ve known better. Thanks for tidying up! cheers ./daniel
Re: don't create storage when unnecessary
On Thu, Dec 06, 2018 at 06:55:52PM -0300, Alvaro Herrera wrote: > Some time ago, after partitioned indexes had been pushed, I realized > that even though I didn't want them to have relfilenodes, they did. And > looking closer I noticed that *a lot* of relation kinds that didn't need > relfilenodes, had them anyway. > > This patch fixes that; if no relfilenode is needed, it's not created. > > I didn't verify pg_upgrade behavior across this commit. Maybe something > needs tweaking there. > > PS: I think it'd be worth following up with this ... > https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=avcyh...@mail.gmail.com A macro makes sense to control that. Now I have to admit that I don't like your solution. Wouldn't it be cleaner to assign InvalidOid to relfilenode in such cases? The callers of heap_create would need to be made smarter when they now pass down a relfilenode (looking at you, DefineIndex!), but that seems way more consistent to me. Some tests would also be welcome. -- Michael signature.asc Description: PGP signature
Re: Use durable_unlink for .ready and .done files for WAL segment removal
On 12/6/18, 4:54 PM, "Michael Paquier" wrote: > On Thu, Dec 06, 2018 at 02:43:35PM +0900, Michael Paquier wrote: >> Why? A WARNING would be logged if the first unlink() fails, and >> another, different WARNING would be logged if the subsequent fsync >> fails. It looks enough to me to make a distinction between both. Now, >> you may have a point in the fact that we could also live with only using >> unlink() for this code path, as even on repetitive crashes this would >> take care of removing orphan archive status files consistently. > > After sleeping on that, using plain unlink() makes indeed the most > sense. Any objections if I move on with that, adding a proper comment > explaining the choice? I don't plan to finish wrapping this patch today > but Monday my time anyway. That seems reasonable to me. Nathan
Re: amcheck verification for GiST
On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin wrote: > (0001-GiST-verification-function-for-amcheck-v2.patch) Thanks for working on this. Some feedback: * You do this: > +/* Check of an internal page. Hold locks on two pages at a time > (parent+child). */ This isn't consistent with what you do within verify_nbtree.c, which deliberately avoids ever holding more than a single buffer lock at a time, on general principle. That isn't necessarily a reason why you have to do the same, but it's not clear why you do things that way. Why isn't it enough to have a ShareLock on the relation? Maybe this is a sign that it would be a good idea to always operate on a palloc()'d copy of the page, by introducing something equivalent to palloc_btree_page(). (That would also be an opportunity to do very basic checks on every page.) * You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around. Certainly, there should be one at the top of the main loop. * Maybe gist_index_check() should be called gist_index_parent_check(), since it's rather like the existing verification function bt_index_parent_check(). * Alternatively, you could find a way to make your function only need an AccessShareLock -- that would make gist_index_check() an appropriate name. That would probably require careful thought about VACUUM. * Why is it okay to do this?: > + if (GistTupleIsInvalid(idxtuple)) > + ereport(LOG, > + (errmsg("index \"%s\" contains an inner tuple marked as > invalid", > + RelationGetRelationName(rel)), > +errdetail("This is caused by an incomplete page split at > crash recovery before upgrading to PostgreSQL 9.1."), > +errhint("Please REINDEX it."))); You should probably mention the gistdoinsert() precedent for this. * Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as possible, adding almost any check that I can think of, provided it hasn't got very high overhead. Note that gistcheckpage() doesn't do this for you. * Should we be concerned about the memory used by pushStackIfSplited()? * How about a cross-check between IndexTupleSize() and ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that we have this redundancy, which wastes space, but we do, so we might as well get some small benefit from it. -- Peter Geoghegan
Re: psql display of foreign keys
On Thu, Dec 06, 2018 at 01:26:30PM -0300, Alvaro Herrera wrote: > That means I can just get pg_partition_root() done first, and try to > write the queries using that instead of WITH RECURSIVE. FWIW, I was just writing a patch about this one, so I was going to spawn a new thread today :) Let's definitely avoid WITH RECURSIVE if we can. -- Michael signature.asc Description: PGP signature
Re: Use durable_unlink for .ready and .done files for WAL segment removal
On Thu, Dec 06, 2018 at 02:43:35PM +0900, Michael Paquier wrote: > Why? A WARNING would be logged if the first unlink() fails, and > another, different WARNING would be logged if the subsequent fsync > fails. It looks enough to me to make a distinction between both. Now, > you may have a point in the fact that we could also live with only using > unlink() for this code path, as even on repetitive crashes this would > take care of removing orphan archive status files consistently. After sleeping on that, using plain unlink() makes indeed the most sense. Any objections if I move on with that, adding a proper comment explaining the choice? I don't plan to finish wrapping this patch today but Monday my time anyway. -- Michael signature.asc Description: PGP signature
Re: Hint and detail punctuation
On Thu, Dec 06, 2018 at 09:44:22AM -0300, Alvaro Herrera wrote: > LGTM. Thanks Álvaro for the additional lookup. I have committed the patch, after updating the regression test output as this stuff was forgotten, but that was easy enough to fix. -- Michael signature.asc Description: PGP signature
Re: don't create storage when unnecessary
Hi, On 2018-12-06 18:55:52 -0300, Alvaro Herrera wrote: > Some time ago, after partitioned indexes had been pushed, I realized > that even though I didn't want them to have relfilenodes, they did. And > looking closer I noticed that *a lot* of relation kinds that didn't need > relfilenodes, had them anyway. > > This patch fixes that; if no relfilenode is needed, it's not created. > > I didn't verify pg_upgrade behavior across this commit. Maybe something > needs tweaking there. Hm, that generally sounds like a good plan. Could we back this up with tests in misc_sanity.sql or such? - Andres
[PATCH] Allow anonymous rowtypes in function return column definition
Currently, the following query SELECT q.b = row(2) FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record); would fail with ERROR: column "b" has pseudo-type record This is due to CheckAttributeNamesTypes() being used on a function coldeflist as if it was a real relation definition. But in the context of a query there seems to be no harm in allowing this, as other ways of manipulating anonymous rowtypes work well, e.g.: SELECT (ARRAY[ROW(1, ROW(2))])[1]; Elvis >From 873ecd6b31abc28c787f398d78ba2511c6e712a2 Mon Sep 17 00:00:00 2001 From: Elvis Pranskevichus Date: Thu, 6 Dec 2018 17:16:28 -0500 Subject: [PATCH] Allow anonymous rowtypes in function return column definition Currently, the following query SELECT q.b = row(2) FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record); would fail with ERROR: column "b" has pseudo-type record This is due to CheckAttributeNamesTypes() being used on a function coldeflist as if it was a real relation definition. But in the context of a query there seems to be no harm in allowing this, as other ways of manipulating anonymous rowtypes work well, e.g.: SELECT (ARRAY[ROW(1, ROW(2))])[1]; --- src/backend/catalog/heap.c | 23 +++ src/backend/catalog/index.c| 2 +- src/backend/commands/tablecmds.c | 4 ++-- src/backend/parser/parse_relation.c| 2 +- src/include/catalog/heap.h | 6 -- src/test/regress/expected/rowtypes.out | 7 +++ src/test/regress/sql/rowtypes.sql | 2 ++ 7 files changed, 32 insertions(+), 14 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 8c52a1543d..ab9cb600f1 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -412,7 +412,8 @@ heap_create(const char *relname, */ void CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind, - bool allow_system_table_mods) + bool allow_system_table_mods, + bool allow_anonymous_records) { int i; int j; @@ -471,7 +472,8 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind, TupleDescAttr(tupdesc, i)->atttypid, TupleDescAttr(tupdesc, i)->attcollation, NIL, /* assume we're creating a new rowtype */ - allow_system_table_mods); + allow_system_table_mods, + allow_anonymous_records); } } @@ -494,7 +496,8 @@ void CheckAttributeType(const char *attname, Oid atttypid, Oid attcollation, List *containing_rowtypes, - bool allow_system_table_mods) + bool allow_system_table_mods, + bool allow_anonymous_records) { char att_typtype = get_typtype(atttypid); Oid att_typelem; @@ -507,7 +510,8 @@ CheckAttributeType(const char *attname, * catalogs (this allows creating pg_statistic and cloning it during * VACUUM FULL) */ - if (atttypid != ANYARRAYOID || !allow_system_table_mods) + if (!((atttypid == ANYARRAYOID && allow_system_table_mods) || +(atttypid == RECORDOID && allow_anonymous_records))) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("column \"%s\" has pseudo-type %s", @@ -520,7 +524,8 @@ CheckAttributeType(const char *attname, */ CheckAttributeType(attname, getBaseType(atttypid), attcollation, containing_rowtypes, - allow_system_table_mods); + allow_system_table_mods, + allow_anonymous_records); } else if (att_typtype == TYPTYPE_COMPOSITE) { @@ -558,7 +563,8 @@ CheckAttributeType(const char *attname, CheckAttributeType(NameStr(attr->attname), attr->atttypid, attr->attcollation, containing_rowtypes, - allow_system_table_mods); + allow_system_table_mods, + allow_anonymous_records); } relation_close(relation, AccessShareLock); @@ -572,7 +578,8 @@ CheckAttributeType(const char *attname, */ CheckAttributeType(attname, att_typelem, attcollation, containing_rowtypes, - allow_system_table_mods); + allow_system_table_mods, + allow_anonymous_records); } /* @@ -1063,7 +1070,7 @@ heap_create_with_catalog(const char *relname, */ Assert(IsNormalProcessingMode() || IsBootstrapProcessingMode()); - CheckAttributeNamesTypes(tupdesc, relkind, allow_system_table_mods); + CheckAttributeNamesTypes(tupdesc, relkind, allow_system_table_mods, false); /* * This would fail later on anyway, if the relation already exists. But diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 44625a507b..ac5a285be7 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -435,7 +435,7 @@ ConstructTupleDescriptor(Relation heapRelation, */ CheckAttributeType(NameStr(to->attname), to->atttypid, to->attcollation, - NIL, false); + NIL, false, false); } /* diff --git a/src/backend/commands/tablecmds.c
rewrite ExecPartitionCheckEmitError
Just on cleanliness grounds, I propose to rewrite the function in $SUBJECT. I came across this while reviewing some already-committed patch for partition pruning, and it's been sitting in my laptop ever since. I think the current coding is too convoluted and hard to follow. The patch makes it much simpler (IMO). -- Álvaro Herrera >From 8c8ac713ac168b2678925e58cdfc83694a8fa5b9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 14 Nov 2018 20:57:19 -0300 Subject: [PATCH] rewrite ExecPartitionCheckEmitError --- src/backend/executor/execMain.c | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index d83d296d82..78c8ce4935 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1837,11 +1837,9 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { - Relation rel = resultRelInfo->ri_RelationDesc; - Relation orig_rel = rel; - TupleDesc tupdesc = RelationGetDescr(rel); - char *val_desc; - Bitmapset *modifiedCols; + Oid root_relid; + TupleDesc tupdesc; + char *valdsc; Bitmapset *insertedCols; Bitmapset *updatedCols; @@ -1851,11 +1849,13 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, */ if (resultRelInfo->ri_PartitionRoot) { - TupleDesc old_tupdesc = RelationGetDescr(rel); + TupleDesc old_tupdesc; AttrNumber *map; - rel = resultRelInfo->ri_PartitionRoot; - tupdesc = RelationGetDescr(rel); + root_relid = RelationGetRelid(resultRelInfo->ri_PartitionRoot); + + old_tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); + tupdesc = RelationGetDescr(resultRelInfo->ri_PartitionRoot); /* a reverse map */ map = convert_tuples_by_name_map_if_req(old_tupdesc, tupdesc, gettext_noop("could not convert row type")); @@ -1868,20 +1868,24 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, slot = execute_attr_map_slot(map, slot, MakeTupleTableSlot(tupdesc, )); } + else + { + root_relid = RelationGetRelid(resultRelInfo->ri_RelationDesc); + tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); + } insertedCols = GetInsertedColumns(resultRelInfo, estate); updatedCols = GetUpdatedColumns(resultRelInfo, estate); - modifiedCols = bms_union(insertedCols, updatedCols); - val_desc = ExecBuildSlotValueDescription(RelationGetRelid(rel), - slot, - tupdesc, - modifiedCols, - 64); + valdsc = ExecBuildSlotValueDescription(root_relid, + slot, + tupdesc, + bms_union(insertedCols, updatedCols), + 64); ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates partition constraint", - RelationGetRelationName(orig_rel)), - val_desc ? errdetail("Failing row contains %s.", val_desc) : 0)); + RelationGetRelationName(resultRelInfo->ri_RelationDesc)), + valdsc ? errdetail("Failing row contains %s.", valdsc) : 0)); } /* -- 2.11.0
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Sun, Dec 2, 2018 at 4:21 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > About the status of the patch, to me it should be RWF. It's been > > moved to the next CF several times with no progress besides rebases. > > Let me disagree. Judging from the commentaries in this discussion it could be > significant and useful feature, and the author is trying to keep this patch > uptodate. The lack of reviews could be due other reasons than desirability of > the patch (as well as as for many other interesting proposals in hackers). +1. I, for one, care about this feature. We're all very busy, but I don't want to see it die. -- Peter Geoghegan
don't create storage when unnecessary
Some time ago, after partitioned indexes had been pushed, I realized that even though I didn't want them to have relfilenodes, they did. And looking closer I noticed that *a lot* of relation kinds that didn't need relfilenodes, had them anyway. This patch fixes that; if no relfilenode is needed, it's not created. I didn't verify pg_upgrade behavior across this commit. Maybe something needs tweaking there. PS: I think it'd be worth following up with this ... https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=avcyh...@mail.gmail.com -- Álvaro Herrera >From 25d80040ffe2fa4fa173b4db7a6e91461542fef3 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 9 Aug 2018 16:18:08 -0400 Subject: [PATCH] don't create storage when not necessary --- src/backend/catalog/heap.c | 2 +- src/backend/utils/cache/relcache.c | 8 src/include/utils/rel.h| 10 ++ 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 11debaa780..6f73ff4fcf 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -372,7 +372,7 @@ heap_create(const char *relname, */ if (OidIsValid(relfilenode)) create_storage = false; - else + else if (create_storage) relfilenode = relid; /* diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index c3071db1cd..76a4cc65be 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -1253,6 +1253,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt) static void RelationInitPhysicalAddr(Relation relation) { + /* these relations kinds have no storage */ + if (relation->rd_rel->relkind == RELKIND_VIEW || + relation->rd_rel->relkind == RELKIND_COMPOSITE_TYPE || + relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE || + relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || + relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + return; + if (relation->rd_rel->reltablespace) relation->rd_node.spcNode = relation->rd_rel->reltablespace; else diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 2217081dcc..6ac40fd84c 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -451,12 +451,14 @@ typedef struct ViewOptions /* * RelationIsMapped * True if the relation uses the relfilenode map. - * - * NB: this is only meaningful for relkinds that have storage, else it - * will misleadingly say "true". */ #define RelationIsMapped(relation) \ - ((relation)->rd_rel->relfilenode == InvalidOid) + (((relation)->rd_rel->relfilenode == InvalidOid) && \ + ((relation)->rd_rel->relkind == RELKIND_RELATION || \ + (relation)->rd_rel->relkind == RELKIND_INDEX || \ + (relation)->rd_rel->relkind == RELKIND_SEQUENCE || \ + (relation)->rd_rel->relkind == RELKIND_TOASTVALUE || \ + (relation)->rd_rel->relkind == RELKIND_MATVIEW)) /* * RelationOpenSmgr -- 2.11.0
Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
On Thu, Dec 6, 2018 at 12:51 PM Alexander Korotkov wrote: > So, algorithm introduced by 218f51584d5 is broken. It tries to > guarantee that there are no inserters in the subtree by placing > cleanup lock to subtree root, assuming that inserter holds pins on the > path from root to leaf. But due to concurrent splits of internal > pages the pins held can be not relevant to actual path. I don't see > the way to fix this easily. So, I think we should revert it from back > branches and try to reimplement that in master. > > However, I'd like to note that 218f51584d5 introduces two changes: > 1) Cleanup locking only if there pages to delete > 2) Cleanup locking only subtree root > The 2nd one is broken. But the 1st one seems still good for me and > useful, because in vast majority of cases vacuum doesn't delete any > index pages. So, I propose to revert 218f51584d5, but leave there > logic, which locks root for cleanup only once there are pages to > delete. Any thoughts? Can you post a patch that just removes the 2nd part, leaving the still-correct first part? Your proposal sounds reasonable, but I'd like to see what you have in mind in detail before commenting. Thanks -- Peter Geoghegan
Re: Compressed TOAST Slicing
On Sun, Dec 2, 2018 at 7:03 AM Rafia Sabih wrote: > > The idea looks good and believing your performance evaluation it seems > like a practical one too. Thank you kindly for the review! > A comment explaining how this check differs for is_slice case would be helpful. > Looks like PG indentation is not followed here for n. I have attached updated patches that add the comment and adhere to the Pg variable declaration indentation styles, ATB! P -- Paul Ramsey http://crunchydata.com compressed-datum-slicing-left-20190103a.patch Description: Binary data compressed-datum-slicing-20190103a.patch Description: Binary data
Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
On Thu, Dec 6, 2018 at 10:56 PM Alexander Korotkov wrote: > On Tue, Dec 4, 2018 at 8:01 PM Andres Freund wrote: > > On 2018-11-10 17:42:16 -0800, Peter Geoghegan wrote: > > > On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan wrote: > > > > Teodor: Do you think that the issue is fixable? It looks like there > > > > are serious issues with the design of 218f51584d5 to me. I don't think > > > > the general "there can't be any inserters at this subtree" thing works > > > > given that we have to couple buffer locks when moving right for other > > > > reasons. We call ginStepRight() within ginFinishSplit(), for reasons > > > > that date back to bug fix commit ac4ab97e from 2013 -- that detail is > > > > probably important, because it seems to be what breaks the subtree > > > > design (we don't delete in two phases or anything in GIN). > > > > > > Ping? > > > > > > This is a thorny problem, and I'd like to get your input soon. I > > > suspect that reverting 218f51584d5 may be the ultimate outcome, even > > > though it's a v10 feature. > > > > Teodor, any chance for a response here? It's not OK to commit something > > and then just not respond to bug-reports, especially when they're well > > diagnose and clearly point towards issues in a commit of yours. > > Unfortunately, Teodor is unreachable at least till next week. > > > Alexander, CCing you, perhaps you could point the thread out to Teodor? > > I'll take a look at this issue. I've run through the thread. So, algorithm introduced by 218f51584d5 is broken. It tries to guarantee that there are no inserters in the subtree by placing cleanup lock to subtree root, assuming that inserter holds pins on the path from root to leaf. But due to concurrent splits of internal pages the pins held can be not relevant to actual path. I don't see the way to fix this easily. So, I think we should revert it from back branches and try to reimplement that in master. However, I'd like to note that 218f51584d5 introduces two changes: 1) Cleanup locking only if there pages to delete 2) Cleanup locking only subtree root The 2nd one is broken. But the 1st one seems still good for me and useful, because in vast majority of cases vacuum doesn't delete any index pages. So, I propose to revert 218f51584d5, but leave there logic, which locks root for cleanup only once there are pages to delete. Any thoughts? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: minor fix in CancelVirtualTransaction
Same patch, commit message added. Adding to commitfest. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 2cbe6fe5ac7617e424a63b820a6a2c83b712bab5 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 6 Dec 2018 17:30:58 -0300 Subject: [PATCH] Optimize CancelVirtualTransaction a little bit When scanning the list of virtual transactions looking for one particular vxid, we cancel the transaction *and* break out of the loop only if both backendId and localTransactionId matches us. The canceling part is okay, but limiting the break to that case is pointless: if we found the correct backendId, there cannot be any other entry with the same backendId. Rework the loop so that we break out of it if backendId matches even if the localTransactionId part doesn't. Discussion: https://postgr.es/m/20180629170512.d6kof6l6uu3qqpd6@alvherre.pgsql --- src/backend/storage/ipc/procarray.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index dc7e875680..5ed588d9be 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2683,18 +2683,20 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode) GET_VXID_FROM_PGPROC(procvxid, *proc); - if (procvxid.backendId == vxid.backendId && - procvxid.localTransactionId == vxid.localTransactionId) + if (procvxid.backendId == vxid.backendId) { - proc->recoveryConflictPending = true; - pid = proc->pid; - if (pid != 0) + if (procvxid.localTransactionId == vxid.localTransactionId) { -/* - * Kill the pid if it's still here. If not, that's what we - * wanted so ignore any errors. - */ -(void) SendProcSignal(pid, sigmode, vxid.backendId); +proc->recoveryConflictPending = true; +pid = proc->pid; +if (pid != 0) +{ + /* + * Kill the pid if it's still here. If not, that's what we + * wanted so ignore any errors. + */ + (void) SendProcSignal(pid, sigmode, vxid.backendId); +} } break; } -- 2.11.0
Re: Alter table documentation page (again)
On 2018-Dec-06, Lætitia Avrot wrote: > > I'd rename the action in ON DELETE/UPDATE to referential_action, both in > > alter_table and in create_table (the latter just for consistency). > > I love that "option 3" ! So clever! Thanks :-) I checked the SQL standard after sending that email, and indeed it calls those elements . Go figure. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Alter table documentation page (again)
> > > I'd rename the action in ON DELETE/UPDATE to referential_action, both in > alter_table and in create_table (the latter just for consistency). > I love that "option 3" ! So clever!
Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
On Tue, Dec 4, 2018 at 8:01 PM Andres Freund wrote: > On 2018-11-10 17:42:16 -0800, Peter Geoghegan wrote: > > On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan wrote: > > > Teodor: Do you think that the issue is fixable? It looks like there > > > are serious issues with the design of 218f51584d5 to me. I don't think > > > the general "there can't be any inserters at this subtree" thing works > > > given that we have to couple buffer locks when moving right for other > > > reasons. We call ginStepRight() within ginFinishSplit(), for reasons > > > that date back to bug fix commit ac4ab97e from 2013 -- that detail is > > > probably important, because it seems to be what breaks the subtree > > > design (we don't delete in two phases or anything in GIN). > > > > Ping? > > > > This is a thorny problem, and I'd like to get your input soon. I > > suspect that reverting 218f51584d5 may be the ultimate outcome, even > > though it's a v10 feature. > > Teodor, any chance for a response here? It's not OK to commit something > and then just not respond to bug-reports, especially when they're well > diagnose and clearly point towards issues in a commit of yours. Unfortunately, Teodor is unreachable at least till next week. > Alexander, CCing you, perhaps you could point the thread out to Teodor? I'll take a look at this issue. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Deleting no-op CoerceToDomains from plan trees
Attached is a complete patch for one of the tasks discussed in [1], namely dropping CoerceToDomain nodes from plan trees if their domains have no constraints that need to be checked. After I fleshed out the original patch to include sinval signaling for changes of domain constraints, I was surprised to find that the existing test case in domain.sql still failed. The reason turned out to be that the CoerceToDomain node was not being done as part of a normal plpgsql expression, but within a cast expression cached by plpgsql's get_cast_hashentry() function ... and that code has no provision for cache flushes. I concluded that the best way to fix that was to create new functionality in plancache.c for caching expression trees with appropriate invalidation support, so this patch includes that. (Conceivably that should be a separate patch, but it didn't really seem worth the trouble.) As patched, get_cast_hashentry() is the only user of the new plancache code. I looked around at other callers of expression_planner() to see if anything else was caching the result for more than one query. I found such a caller in typcache.c's domain constraint caching, but as discussed in [2], it's not really clear that there's any point in changing that code; I just added a comment about the issue instead. (I also noted some callers in relation partitioning code, which I'm a tad suspicious of, mainly because it's not apparent to me why that code should be executing arbitrary expressions in the first place. But if there's anything broken there, it seems like material for a separate discussion.) Some other notes for review: * In typecmds.c, I fixed some inconsistency about whether the various subcommands of ALTER DOMAIN released catalog locks or not. * I used a dlist to thread CachedExpressions together in plancache.c, and failed to resist the temptation to change CachedPlanSources to use a dlist as well. That eliminates a potential performance problem in DropCachedPlan, though I'm not sure how important that is. regards, tom lane [1] https://www.postgresql.org/message-id/5978.1544030...@sss.pgh.pa.us [2] https://www.postgresql.org/message-id/12539.1544107...@sss.pgh.pa.us diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 1ffc823..544f423 100644 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *** *** 62,67 --- 62,68 #include "parser/parse_type.h" #include "utils/builtins.h" #include "utils/fmgroids.h" + #include "utils/inval.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/rel.h" *** AlterDomainDefault(List *names, Node *de *** 2297,2303 ObjectAddressSet(address, TypeRelationId, domainoid); /* Clean up */ ! heap_close(rel, NoLock); heap_freetuple(newtuple); return address; --- 2298,2304 ObjectAddressSet(address, TypeRelationId, domainoid); /* Clean up */ ! heap_close(rel, RowExclusiveLock); heap_freetuple(newtuple); return address; *** AlterDomainDropConstraint(List *names, c *** 2494,2501 systable_endscan(conscan); heap_close(conrel, RowExclusiveLock); - heap_close(rel, NoLock); - if (!found) { if (!missing_ok) --- 2495,2500 *** AlterDomainDropConstraint(List *names, c *** 2509,2516 --- 2508,2525 constrName, TypeNameToString(typename; } + /* + * We must send out an sinval message for the domain, to ensure that any + * dependent plans get rebuilt. Since this command doesn't change the + * domain's pg_type row, that won't happen automatically; do it manually. + */ + CacheInvalidateHeapTuple(rel, tup, NULL); + ObjectAddressSet(address, TypeRelationId, domainoid); + /* Clean up */ + heap_close(rel, RowExclusiveLock); + return address; } *** AlterDomainAddConstraint(List *names, No *** 2615,2620 --- 2624,2636 if (!constr->skip_validation) validateDomainConstraint(domainoid, ccbin); + /* + * We must send out an sinval message for the domain, to ensure that any + * dependent plans get rebuilt. Since this command doesn't change the + * domain's pg_type row, that won't happen automatically; do it manually. + */ + CacheInvalidateHeapTuple(typrel, tup, NULL); + ObjectAddressSet(address, TypeRelationId, domainoid); /* Clean up */ diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index c729a99..b645648 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** adjust_paths_for_srfs(PlannerInfo *root, *** 5923,5932 * side-effect that is useful when the expression will get evaluated more than * once. Also, we must fix operator function IDs. * * Note: this must not make any damaging changes to the passed-in expression * tree. (It would actually be okay to
How do I get altered object from GRANT event trigger?
I noticed that there is no information for GRANT in pg_event_trigger_ddl_commands(). I am trying to determine which table/schema is being altered. Is there any way to do this either with built-in functionality, or extending the current code in event_trigger.c? Is the reason for this limitation that GRANT does not perform the same kind of catalog changes that other commands do? Thanks! Jeremy
Re: Alter table documentation page (again)
On 2018-Dec-06, Lætitia Avrot wrote: > Any thought ? Which option do you prefer ? I can write the patch, but I'd > like to know what you think about that. I'd rename the action in ON DELETE/UPDATE to referential_action, both in alter_table and in create_table (the latter just for consistency). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: slow queries over information schema.tables
Robert Haas writes: > On Thu, Dec 6, 2018 at 12:03 PM Tom Lane wrote: >> [ further experimentation... ] It looks like if you prepare >> a query and then just execute it repeatedly in one transaction, >> you'd not reach AIM (as long as you were getting generic plans). >> Possibly that's a gap worth closing. > If we called it at the start of every query, couldn't we dispense with > the call in relation_openrv_extended()? No. You need to do AIM *after* obtaining the lock, else you still have the race condition that you can execute a query on a table without being aware of recent DDL on it. What we could possibly do to close the gap cited above is to have plancache.c's CheckCachedPlan force an AIM call if it notices that the plan it wants to re-use has a non-empty PlanInvalItems list. If it does not, then there is nothing that AIM would cause invalidation for anyway. This still leaves us with a query-duration-sized race condition window for DDL on functions and domains, but not any larger than that. regards, tom lane
Alter table documentation page (again)
Hi, I was trying to answer a former colleague question about postgres' default behaviour when deleting or updating when she pointed out that Postgres alter table documentation page used twice the 'action' keywords for two different things. And indeed she is right : - it's used to described any action actionable with ALTER TABLE statement - and also to reference action taken on update or delete Here are my thoughts about that conflict : - We should keep the action keyword for on update or on delete action as it's what's done on the CREATE TABLE documentation page. - Option 1: We could either categorized the "actions" we can do with ALTER TABLE (as column_action, trigger_action, constraint_action, rule_action, other_action), but I'm not a big fan of "other_action" and I can't think of another name - Option 2: Or we could just rename the first action as "alter-table_action" Any thought ? Which option do you prefer ? I can write the patch, but I'd like to know what you think about that. Anyway, thanks to Brigitte Blanc-Lafay tohave pointed this out! :-) Cheers, Lætitia -- *Think! Do you really need to print this email ? * *There is no Planet B.*
Re: proposal: plpgsql pragma statement
čt 6. 12. 2018 v 18:17 odesílatel Robert Haas napsal: > On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule > wrote: > > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This > is not runtime statement - the information from this command will be > assigned to related object - function, block, command at parser time. > > That's sensible, but the syntax you were proposing didn't look like it > was related to a specific statement. I was objecting to the idea that > PRAGMA whatever; should be construed as an annotation of, > specifically, the following statement. > please, can you propose, some what you like? For my purpose I can imagine PRAGMA on function level with same syntax like PL/SQL - I need to push somewhere some information that I can use for plpgsql_check to protect users against false alarms. The locality in this moment is not too important for me. But I prefer solution that doesn't looks too strange, and is possible just with change plpgsql parser. Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: WIP: Avoid creation of the free space map for small tables
On 12/3/18, Amit Kapila wrote: > On Mon, Dec 3, 2018 at 11:15 AM John Naylor wrote: >> Per your recent comment, we no longer check relation size if we waited >> on a relation extension lock, so this is essentially a reversion to an >> earlier version. >> > > fsm_local_set is being called from RecordAndGetPageWithFreeSpace and > GetPageWithFreeSpace whereas the change we have discussed was specific > to GetPageWithFreeSpace, so not sure if we need any change in > fsm_local_set. Not needed, but I assumed wrongly you'd think it unclear otherwise. I've now restored the generality and updated the comments to be closer to v8. > It would be good if you add few comments atop functions > GetPageWithFreeSpace, RecordAndGetPageWithFreeSpace and > RecordPageWithFreeSpace about their interaction with local map. Done. Also additional minor comment editing. I've added an additional regression test for finding the right block and removed a test I thought was redundant. I've kept the test file in its own schedule. -John Naylor From 7494ff89be7ad4e07f277f3bcc6e866c5527ce45 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Thu, 6 Dec 2018 10:49:03 -0600 Subject: [PATCH v10 1/2] Avoid creation of the free space map for small tables. The FSM isn't created if the heap has 4 blocks or fewer. If the last known target block has insufficient space, try every block before extending the heap. If a heap with a FSM is truncated back to below the threshold, the FSM stays around and can be used as usual. --- contrib/pageinspect/expected/page.out | 77 +++ contrib/pageinspect/sql/page.sql | 33 +-- doc/src/sgml/storage.sgml | 13 +- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +- src/backend/access/heap/hio.c | 47 ++-- src/backend/access/transam/xact.c | 14 ++ src/backend/commands/vacuumlazy.c | 17 +- src/backend/storage/freespace/README | 11 +- src/backend/storage/freespace/freespace.c | 251 +- src/backend/storage/freespace/indexfsm.c | 6 +- src/include/storage/freespace.h | 9 +- src/test/regress/expected/fsm.out | 62 ++ src/test/regress/parallel_schedule| 5 + src/test/regress/sql/fsm.sql | 46 15 files changed, 498 insertions(+), 105 deletions(-) create mode 100644 src/test/regress/expected/fsm.out create mode 100644 src/test/regress/sql/fsm.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..83e5910453 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48 +1,69 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); -VACUUM test1; -- set up FSM +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; +-- set up FSM and VM +VACUUM test_rel_forks; -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; -ERROR: block number 1 is out of range for relation "test1" -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; vm_0 -- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1; -ERROR: block number 1 is out of range for relation "test1" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; +ERROR: block number 1 is out of range for relation "test_rel_forks" SELECT octet_length(get_raw_page('xxx', 'main', 0)); ERROR: relation "xxx" does not exist -SELECT octet_length(get_raw_page('test1', 'xxx', 0)); +SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0)); ERROR: invalid fork name HINT: Valid fork names are "main", "fsm", "vm", and "init". -SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0); +SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); + fsm_page_contents
Re: proposal: plpgsql pragma statement
On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule wrote: > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This is > not runtime statement - the information from this command will be assigned to > related object - function, block, command at parser time. That's sensible, but the syntax you were proposing didn't look like it was related to a specific statement. I was objecting to the idea that PRAGMA whatever; should be construed as an annotation of, specifically, the following statement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal: plpgsql pragma statement
čt 6. 12. 2018 v 18:05 odesílatel Pavel Stehule napsal: > > > čt 6. 12. 2018 v 17:57 odesílatel Robert Haas > napsal: > >> On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris >> wrote: >> > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it >> affecting runtime? >> >> Well, I don't know what Ada does with PRAGMA exactly, but look at >> these examples from Oracle: >> >> http://psoug.org/definition/pragma.htm >> >> You wouldn't *execute* those at runtime, but at least for some of >> them, the runtime behavior would depend on whether or not they were >> specified. It certainly seems possible that we might want to have >> similar things. >> > > My proposal doesn't block it. > > The pragma in Ada has three levels - function, block, statement. I propose > (in this moment) just statement level syntax, but I am sure, so other > levels are possible. > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA. This is not runtime statement - the information from this command will be assigned to related object - function, block, command at parser time. > I would to have a autonomous functions or autonomous blocks too, and Ada > syntax (same with PL/SQL) is good. > > Regards > > Pavel > > > > >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> >
Re: slow queries over information schema.tables
On Thu, Dec 6, 2018 at 12:03 PM Tom Lane wrote: > In my testing, that still hits AIM() during parserOpenTable(). Oh, I see. relation_openrv_extended() calls it. > [ further experimentation... ] It looks like if you prepare > a query and then just execute it repeatedly in one transaction, > you'd not reach AIM (as long as you were getting generic plans). > Possibly that's a gap worth closing. If we called it at the start of every query, couldn't we dispense with the call in relation_openrv_extended()? On net, that would actually mean fewer calls to AcceptInvalidationMessages(), assuming you sometimes run queries that touch multiple relations. And the behavior would be more predictable, too, because you'd (hopefully) have no cases where a command failed to see the results of DDL that committed before the command was issued. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal: plpgsql pragma statement
čt 6. 12. 2018 v 17:57 odesílatel Robert Haas napsal: > On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris > wrote: > > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it > affecting runtime? > > Well, I don't know what Ada does with PRAGMA exactly, but look at > these examples from Oracle: > > http://psoug.org/definition/pragma.htm > > You wouldn't *execute* those at runtime, but at least for some of > them, the runtime behavior would depend on whether or not they were > specified. It certainly seems possible that we might want to have > similar things. > My proposal doesn't block it. The pragma in Ada has three levels - function, block, statement. I propose (in this moment) just statement level syntax, but I am sure, so other levels are possible. I would to have a autonomous functions or autonomous blocks too, and Ada syntax (same with PL/SQL) is good. Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: slow queries over information schema.tables
Robert Haas writes: > On Thu, Dec 6, 2018 at 11:32 AM Tom Lane wrote: >> It's fairly hard to imagine practical cases where we'd not call >> AcceptInvalidationMessages at least once per query, so I'm not >> very sure what you're on about. > Unless I'm confused, it happens any time you run a query that only > touches tables using lockmodes previously acquired by the current > transaction. Like: > BEGIN; > some query; > the same query again; In my testing, that still hits AIM() during parserOpenTable(). [ further experimentation... ] It looks like if you prepare a query and then just execute it repeatedly in one transaction, you'd not reach AIM (as long as you were getting generic plans). Possibly that's a gap worth closing. regards, tom lane
Re: proposal: plpgsql pragma statement
Ahh. Gotcha. Makes sense. On Thu, Dec 6, 2018 at 11:57 AM Robert Haas wrote: > On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris > wrote: > > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it > affecting runtime? > > Well, I don't know what Ada does with PRAGMA exactly, but look at > these examples from Oracle: > > http://psoug.org/definition/pragma.htm > > You wouldn't *execute* those at runtime, but at least for some of > them, the runtime behavior would depend on whether or not they were > specified. It certainly seems possible that we might want to have > similar things. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jonah H. Harris
Re: [PATCH] Opclass parameters
On Thu, Dec 6, 2018 at 11:55 AM Tom Lane wrote: > How about saying that you must give an opclass name if you want to > specify options, ie the syntax is > > [ opclass_name [ ( options... ) ] ] > > I'm not necessarily wedded to that, but it seems worth throwing > out the idea. Agreed, that's not bad, certainly better than making OPTIONS more reserved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal: plpgsql pragma statement
On Thu, Dec 6, 2018 at 11:39 AM Jonah H. Harris wrote: > IIRC, PRAGMA in Ada was compile-time only. How would you foresee it affecting > runtime? Well, I don't know what Ada does with PRAGMA exactly, but look at these examples from Oracle: http://psoug.org/definition/pragma.htm You wouldn't *execute* those at runtime, but at least for some of them, the runtime behavior would depend on whether or not they were specified. It certainly seems possible that we might want to have similar things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Opclass parameters
Robert Haas writes: > On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov wrote: >> "opclass (options)" looks the most natural to me. But we still need some >> keyword before the parentheses when the opclass is not specified since we >> can't distinguish "func_name (func_params)" and "col_name (opclass_options)" >> in grammar. > Are you sure? What's the SQL syntax where there is actually a problem > here? CREATE INDEX requires parentheses around a non-trivial > expression. Well, the reason we have to require parens around nontrivial expressions is mostly lack of forethought about making the syntax non-ambiguous :-( > How about just OPTIONS (options) ? That would require making OPTIONS a fully reserved word, I think, else it's ambiguous with an opclass name. How about saying that you must give an opclass name if you want to specify options, ie the syntax is [ opclass_name [ ( options... ) ] ] I'm not necessarily wedded to that, but it seems worth throwing out the idea. regards, tom lane
Re: slow queries over information schema.tables
On Thu, Dec 6, 2018 at 11:32 AM Tom Lane wrote: > Robert Haas writes: > > I'm not thrilled about depending on sinval without locking, > > particularly given that my proposal to make sure we > > AcceptInvalidationMessages() at least once per query was shouted down. > > It's fairly hard to imagine practical cases where we'd not call > AcceptInvalidationMessages at least once per query, so I'm not > very sure what you're on about. Unless I'm confused, it happens any time you run a query that only touches tables using lockmodes previously acquired by the current transaction. Like: BEGIN; some query; the same query again; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal: plpgsql pragma statement
čt 6. 12. 2018 v 17:27 odesílatel Robert Haas napsal: > On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule > wrote: > > I wrote plpgsql_check https://github.com/okbob/plpgsql_check. > > > > It is working well, but because it does static analyse only, sometimes > it can produces false alarms or it should to stop a analyse, because there > are not necessary data. > > > > https://github.com/okbob/plpgsql_check/issues/36 > > > > I see one possible solution in introduction of pragma statement with > syntax: > > > > PRAGMA keyword [content to semicolon]; > > > > The pragma has a relation to following statement. So the issue 36 can be > solved by pragma > > > > PRAGMA cmdtype CREATE; > > EXECUTE format('CREATE TABLE xxx ... > > > > The PRAGMA statement does nothing in runtime. It works only in compile > time, and add a pair of key, value to next non pragma statement. This > information can be used by some plpgsql extensions. > > > > What do you think about this proposal? > > I think it's commandeering PRAGMA for a fairly narrow purpose. It's > hardly unimaginable that someone in future might want a PRAGMA that > does change runtime behavior, or that affects something other than the > statement which immediately follows. > > I don't see a big problem allowing some kind of annotation that > plpgsql_check can easily access, and I don't even mind it being called > PRAGMA. But I don't think it should foreclose unrelated uses of > PRAGMA which somebody might want to introduce in the future. > The most simple (and probably one possible) implementation is plpgsql statement - because I would not to modify SQL lexer. I am thinking so PRAGMA statement is natural due semantic and usage in Ada, but I am opened any proposals. Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [PATCH] Opclass parameters
On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov wrote: > I agree that we should distinguish per-index and per-column options, but they > can also be AM-specific and opclass-specific. True, but an index is bound to a single AM, and a column is bound to a single opclass which is bound to a single AM. So I'm not very worried about name collisions. Can't we just tell opclass authors to pick names that are unlikely to collide with names chose by the AM or names that are globally reserved? It's hard to imagine that we're ever going to have more than a dozen or so options that could possibly apply to a column, so splitting things up into different namespaces seems like an unnecessary burden on the user. > 'fastupdate' option for GIN is an example of AM-specific per-index option. > > ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column > options having special SQL syntax. "AM-class-specific" here means "specific > only for the class of AMs that support ordering". Now they are stored as > flags > in pg_index.indoption[] but later can be moved to pg_attribute.attoptions. Or left where they are. > And another problem is the options with default values. They may be not > explicitly specified by the user, and in this case in current implementation > nothing will be stored in the catalog because default values can be obtained > from the code. But this will not allow changing of default values without > compatibility issues. So I think it would be better to store both default and > explicitly specified values of all opclass options, but this will require a > major refactoring of current API. Hmm. I think if the default ever changes, it will require a new major release, and we can compensate in pg_dump. That means that a dump taken with an old version will not preserve the options. However, using the pg_dump from the newest release is the recommended procedure, and if you don't do that, you might get outright failures. Failing to preserve an option value in the rare case that a default was changed seems less bad than that. > Also I have idea to define list of opclass parameters declaratively when > opclass > is created using syntax like the following: > > CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) > ] > FOR TYPE ... AS ( > { OPTIONS function_name ( arg_type [,...] ) /* reloptions => binary */ > | OPERATOR ... > } [,...] > ) I'm not sure exposing SQL syntax for this is a very good idea. > "[opclass] WITH OPTIONS (options)" looks too verbose, of course. It's not that bad. > "[opclass] WITH (options)" looks acceptable, but it seems to conflict with > exclusion constraints syntax ("index_elem WITH operator"). Are you sure? The operator can't be ( But it might be confusing anyhow. > "opclass (options)" looks the most natural to me. But we still need some > keyword before the parentheses when the opclass is not specified since we > can't distinguish "func_name (func_params)" and "col_name (opclass_options)" > in grammar. Are you sure? What's the SQL syntax where there is actually a problem here? CREATE INDEX requires parentheses around a non-trivial expression. How about just OPTIONS (options) ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [WIP] CREATE SUBSCRIPTION with FOR TABLES clause (table filter)
Hello! Thank you for questions! > I've got few questions: > 1. How will the subscription work for inherited tables? Do we need tests for > that? For subscription created with `FOR TABLE` we can't support inherit tables because subscriber don't know anything about inherit. In new patch i remove `ONLY` for `FOR TABLE` in subscription related statements > 2. ALTER PUBLICATION has ADD\DROP and SET. Should we add SET too? Or is there > a reason not to do that? Added it in new patch > 3. Message "Must be superuser to create FOR ALL TABLES subscriptions" seems a > bit strange to me. Also, this literal is embedded into translations. I do not > know how we deal with it, how do we deal for example with "måste vara > superanvändare för att skapa prenumerationer" or "для создания подписок нужно > быть суперпользователем"? Where do we insert FOR ALL TABLES? I add hint `Use CREATE SUBSCRIPTION ... FOR TABLE ...` > 4. How does default behavior differ from FOR ALL TABLES? The same with default implementation > 5. Can we alter subscription FOR ALL TABLES? Drop some tables out of the > subscription? For subscriptions created with `FOR ALL TABLES` (default), you can't change subscribed tables by `ALTER SUBSCRIPTION ADD/DROP` table, you should use `ALTER SUBSCRIPTION REFRESH PUBLICATION` And i don't know how do export for user created subscriptions, because now non superuser can't select subconninfo column 03.12.2018, 09:06, "Andrey Borodin" : > Hi, Evgeniy! > > Thanks for working on the feature. >> 28 нояб. 2018 г., в 21:41, Evgeniy Efimkin >> написал(а): >> >> Hello! >> I wrote some tests(it's just 01_rep_changes.pl but for non superuser) and >> fix `DROP TABLE` from subscription. Now old and new tests pass. >> >> 22.11.2018, 16:23, "Evgeniy Efimkin" : >>> Hello! >>> New draft attached with filtering table in subscription (ADD/DROP) and >>> allow non-superusers use` CREATE SUBSCRIPTION` for own tables. > > I've looked into the patch. The code looks good and coherent to nearby code. > There are no docs, obviously, there is WiP. > > I've got few questions: > 1. How will the subscription work for inherited tables? Do we need tests for > that? > 2. ALTER PUBLICATION has ADD\DROP and SET. Should we add SET too? Or is there > a reason not to do that? > 3. Message "Must be superuser to create FOR ALL TABLES subscriptions" seems a > bit strange to me. Also, this literal is embedded into translations. I do not > know how we deal with it, how do we deal for example with "måste vara > superanvändare för att skapa prenumerationer" or "для создания подписок нужно > быть суперпользователем"? Where do we insert FOR ALL TABLES? > 4. How does default behavior differ from FOR ALL TABLES? > 5. Can we alter subscription FOR ALL TABLES? Drop some tables out of the > subscription? > > Best regards, Andrey Borodin. Ефимкин Евгений diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index e136aa6a0b..0782dd40f0 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -38,6 +38,7 @@ #include "utils/syscache.h" + static List *textarray_to_stringlist(ArrayType *textarray); /* @@ -70,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok) sub->name = pstrdup(NameStr(subform->subname)); sub->owner = subform->subowner; sub->enabled = subform->subenabled; + sub->alltables = subform->suballtables; /* Get conninfo */ datum = SysCacheGetAttr(SUBSCRIPTIONOID, diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 9021463a4c..58f71a227c 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -30,6 +30,7 @@ #include "catalog/pg_subscription.h" #include "catalog/pg_subscription_rel.h" +#include "commands/dbcommands.h" #include "commands/defrem.h" #include "commands/event_trigger.h" #include "commands/subscriptioncmds.h" @@ -322,6 +323,22 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel) char originname[NAMEDATALEN]; bool create_slot; List *publications; + AclResult aclresult; + bool alltables; + + alltables = !stmt->tables; + /* FOR ALL TABLES requires superuser */ + if (alltables && !superuser()) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser to create subscriptions"), + errhint("Use CREATE SUBSCRIPTION ... FOR TABLE ..."; + + /* must have CREATE privilege on database */ + aclresult = pg_database_aclcheck(MyDatabaseId, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, OBJECT_DATABASE, + get_database_name(MyDatabaseId)); /* * Parse and check options. @@ -342,11 +359,6 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel) if (create_slot) PreventInTransactionBlock(isTopLevel, "CREATE SUBSCRIPTION ... WITH (create_slot = true)"); - if (!superuser()) -
Re: proposal: plpgsql pragma statement
IIRC, PRAGMA in Ada was compile-time only. How would you foresee it affecting runtime? On Thu, Dec 6, 2018 at 11:27 AM Robert Haas wrote: > On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule > wrote: > > I wrote plpgsql_check https://github.com/okbob/plpgsql_check. > > > > It is working well, but because it does static analyse only, sometimes > it can produces false alarms or it should to stop a analyse, because there > are not necessary data. > > > > https://github.com/okbob/plpgsql_check/issues/36 > > > > I see one possible solution in introduction of pragma statement with > syntax: > > > > PRAGMA keyword [content to semicolon]; > > > > The pragma has a relation to following statement. So the issue 36 can be > solved by pragma > > > > PRAGMA cmdtype CREATE; > > EXECUTE format('CREATE TABLE xxx ... > > > > The PRAGMA statement does nothing in runtime. It works only in compile > time, and add a pair of key, value to next non pragma statement. This > information can be used by some plpgsql extensions. > > > > What do you think about this proposal? > > I think it's commandeering PRAGMA for a fairly narrow purpose. It's > hardly unimaginable that someone in future might want a PRAGMA that > does change runtime behavior, or that affects something other than the > statement which immediately follows. > > I don't see a big problem allowing some kind of annotation that > plpgsql_check can easily access, and I don't even mind it being called > PRAGMA. But I don't think it should foreclose unrelated uses of > PRAGMA which somebody might want to introduce in the future. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- Jonah H. Harris
Re: slow queries over information schema.tables
Robert Haas writes: > I'm not thrilled about depending on sinval without locking, > particularly given that my proposal to make sure we > AcceptInvalidationMessages() at least once per query was shouted down. It's fairly hard to imagine practical cases where we'd not call AcceptInvalidationMessages at least once per query, so I'm not very sure what you're on about. regards, tom lane
Re: proposal: plpgsql pragma statement
On Tue, Dec 4, 2018 at 12:13 PM Pavel Stehule wrote: > I wrote plpgsql_check https://github.com/okbob/plpgsql_check. > > It is working well, but because it does static analyse only, sometimes it can > produces false alarms or it should to stop a analyse, because there are not > necessary data. > > https://github.com/okbob/plpgsql_check/issues/36 > > I see one possible solution in introduction of pragma statement with syntax: > > PRAGMA keyword [content to semicolon]; > > The pragma has a relation to following statement. So the issue 36 can be > solved by pragma > > PRAGMA cmdtype CREATE; > EXECUTE format('CREATE TABLE xxx ... > > The PRAGMA statement does nothing in runtime. It works only in compile time, > and add a pair of key, value to next non pragma statement. This information > can be used by some plpgsql extensions. > > What do you think about this proposal? I think it's commandeering PRAGMA for a fairly narrow purpose. It's hardly unimaginable that someone in future might want a PRAGMA that does change runtime behavior, or that affects something other than the statement which immediately follows. I don't see a big problem allowing some kind of annotation that plpgsql_check can easily access, and I don't even mind it being called PRAGMA. But I don't think it should foreclose unrelated uses of PRAGMA which somebody might want to introduce in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: psql display of foreign keys
On 2018-Dec-06, Robert Haas wrote: > On Wed, Dec 5, 2018 at 11:30 AM Alvaro Herrera > wrote: > > I added this patch to commitfest in order to get more opinions, > > particularly on whether to backpatch this. I might commit sooner than > > that if others care to comment. > > I don't think this is a bug fix, and therefore I oppose back-patching it. OK. That means I can just get pg_partition_root() done first, and try to write the queries using that instead of WITH RECURSIVE. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: psql display of foreign keys
On Wed, Dec 5, 2018 at 11:30 AM Alvaro Herrera wrote: > I added this patch to commitfest in order to get more opinions, > particularly on whether to backpatch this. I might commit sooner than > that if others care to comment. I don't think this is a bug fix, and therefore I oppose back-patching it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: slow queries over information schema.tables
On Wed, Dec 5, 2018 at 1:41 PM Tom Lane wrote: > Ah, yes, that is a case where we might have enough knowledge to prove > the test redundant --- but considering that we explicitly discourage > domain NOT NULL as bad style and not fully supported, I can't get > excited about it. I suppose in some cases we might be able to use > predtest.c to prove domain CHECK constraints redundant, but I bet that > it's not worth the trouble. > > The bigger picture here is that people seem to like to use domains > as simple type aliases, which will never have any constraints, but > we're very dumb about that today. So the patch as presented seems > like it has lots of potential applicability, as long as we fix the > invalidation aspect. I'm not thrilled about depending on sinval without locking, particularly given that my proposal to make sure we AcceptInvalidationMessages() at least once per query was shouted down. That means that in corner cases, you could execute many queries without flushing the old cache entries. However, I do agree that locking every type, function, operator, etc. involved in the query is impractical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: minor leaks in pg_dump (PG tarball 10.6)
Greetings, * Pavel Raiskup (prais...@redhat.com) wrote: > On Wednesday, December 5, 2018 4:59:18 PM CET Stephen Frost wrote: > > This change doesn't seem to make any sense to me..? If anything, seems > > like we'd end up overallocating memory *after* this change, where we > > don't today (though an analyzer tool might complain because we don't > > free the memory from it and instead copy the pointer from each of these > > items into the tbinfo structure). > > Correct, I haven't think that one through. I was confused that some items > related to the dropped columns could be unreferenced. But those are > anyways allocated as a solid block with others (not intended to be ever > free()'d). Feel free to ignore that. Right. I've pushed the other changes. Thanks! Stephen signature.asc Description: PGP signature
Re: Minor typo
Greetings, * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote: > At Tue, 4 Dec 2018 11:37:16 -0500, Stephen Frost wrote > in <20181204163716.gr3...@tamriel.snowman.net> > > Thanks to everyone for sharing their thoughts here, the goal is simply > > to try and have the comments as clear as we can for everyone. > > > > Please find attached a larger rewording of the comment in xlogrecord.h, > > and a minor change to xloginsert.c to clarify that we're removing a hole > > and not doing compression at that point. > > > > Other thoughts on this..? > > Thank you for the complete rewriting. It makes the description > clearer at least for me, execpt the following: > > > * present is BLCKSZ - the length of "hole" bytes. > > Maybe it's because I'm not so accustomed to punctuation marks but > I was confused for a second because the '-' didn't look to me a > minus sign, but a dash. If it is not specific to me, a word > 'minus' seems better or, (BLCKSZ - ) bytes > is clearer for me. I agree- that threw me off a bit also when I was first reading it. I've updated that part a bit and pushed it. Thanks! Stephen signature.asc Description: PGP signature
Re: zheap: a new storage format for PostgreSQL
čt 6. 12. 2018 v 17:02 odesílatel Robert Haas napsal: > On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule > wrote: > > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas > napsal: > >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule > wrote: > >> > I have a problem to imagine it. When fill factor will be low, then > there is high risk of high fragmentation - or there some body should to do > defragmentation. > >> > >> I don't understand this. > > > > I don't know if zheap has or has not any tools for elimination > fragmentation of space of page. But I expect so after some set of updates, > when record size is mutable, the free space on page should be fragmented. > Usually, when you have less memory, then fragmentation is faster. > > Still not sure I completely understand, but it's true that zheap > sometimes needs to compact free space on a page. For example, if > you've got a page with a 100-byte hole, and somebody updates a tuple > to make it 2 bytes bigger, you've got to shift that tuple and any that > precede it backwards to reduce the size of the hole to 98 bytes, so > that you can fit the new version of the tuple. If, later, somebody > shrinks that tuple back to the original size, you've now got 100 bytes > of free space on the page, but they are fragmented: 98 bytes in the > "hole," and 2 bytes following the newly-shrunk tuple. If someone > tries to insert a 100-byte tuple in that page, we'll need to > reorganize the page a second time to bring all that free space back > together in a single chunk. > > In my view, and I'm not sure if this is how the code currently works, > we should have just one routine to do a zheap page reorganization > which can cope with all possible scenarios. I imagine that you would > give it the page is it currently exists plus a "minimum tuple size" > for one or more tuples on the page (which must not be smaller than the > current size of that tuple, but could be bigger). It then reorganizes > the page so that every tuple for which a minimum size was given > consumes exactly that amount of space, every other tuple consumes the > minimum possible amount of space, and the remaining space goes into > the hole. So if you call this function with no minimal tuple sizes, > it does a straight defragmentation; if you give it minimum tuple > sizes, then it rearranges the page to make it suitable for a pending > in-place update of those tuples. > > Actually, I think Amit and I discussed further refining this by > splitting the page reorganization function in half. One half would > make a plan for where to put each tuple on the page following the > reorg, but would not actually do anything. That would be executed > before entering a critical section, and might fail if the requested > minimum tuple sizes can't be satisfied. The other half would take the > previously-constructed plan as input and perform the reorganization. > That would be done in the critical section. > > Thank you for reply Pavel -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: zheap: a new storage format for PostgreSQL
On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule wrote: > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas napsal: >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule >> wrote: >> > I have a problem to imagine it. When fill factor will be low, then there >> > is high risk of high fragmentation - or there some body should to do >> > defragmentation. >> >> I don't understand this. > > I don't know if zheap has or has not any tools for elimination fragmentation > of space of page. But I expect so after some set of updates, when record size > is mutable, the free space on page should be fragmented. Usually, when you > have less memory, then fragmentation is faster. Still not sure I completely understand, but it's true that zheap sometimes needs to compact free space on a page. For example, if you've got a page with a 100-byte hole, and somebody updates a tuple to make it 2 bytes bigger, you've got to shift that tuple and any that precede it backwards to reduce the size of the hole to 98 bytes, so that you can fit the new version of the tuple. If, later, somebody shrinks that tuple back to the original size, you've now got 100 bytes of free space on the page, but they are fragmented: 98 bytes in the "hole," and 2 bytes following the newly-shrunk tuple. If someone tries to insert a 100-byte tuple in that page, we'll need to reorganize the page a second time to bring all that free space back together in a single chunk. In my view, and I'm not sure if this is how the code currently works, we should have just one routine to do a zheap page reorganization which can cope with all possible scenarios. I imagine that you would give it the page is it currently exists plus a "minimum tuple size" for one or more tuples on the page (which must not be smaller than the current size of that tuple, but could be bigger). It then reorganizes the page so that every tuple for which a minimum size was given consumes exactly that amount of space, every other tuple consumes the minimum possible amount of space, and the remaining space goes into the hole. So if you call this function with no minimal tuple sizes, it does a straight defragmentation; if you give it minimum tuple sizes, then it rearranges the page to make it suitable for a pending in-place update of those tuples. Actually, I think Amit and I discussed further refining this by splitting the page reorganization function in half. One half would make a plan for where to put each tuple on the page following the reorg, but would not actually do anything. That would be executed before entering a critical section, and might fail if the requested minimum tuple sizes can't be satisfied. The other half would take the previously-constructed plan as input and perform the reorganization. That would be done in the critical section. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: zheap: a new storage format for PostgreSQL
čt 6. 12. 2018 v 16:26 odesílatel Robert Haas napsal: > On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule > wrote: > > I have a problem to imagine it. When fill factor will be low, then there > is high risk of high fragmentation - or there some body should to do > defragmentation. > > I don't understand this. > I don't know if zheap has or has not any tools for elimination fragmentation of space of page. But I expect so after some set of updates, when record size is mutable, the free space on page should be fragmented. Usually, when you have less memory, then fragmentation is faster. Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: COPY FROM WHEN condition
On Wed, Nov 28, 2018 at 6:17 PM Tomas Vondra wrote: > > Comparing with overhead of setting snapshot before evaluating every row > > and considering this > > > > kind of usage is not frequent it seems to me the behavior is acceptable > > I'm not really buying the argument that this behavior is acceptable > simply because using the feature like this will be uncommon. That seems > like a rather weak reason to accept it. > > I however agree we don't want to make COPY less efficient, at least not > unless absolutely necessary. But I think we can handle this simply by > restricting what's allowed to appear the FILTER clause. > > It should be fine to allow IMMUTABLE and STABLE functions, but not > VOLATILE ones. That should fix the example I shared, because f() would > not be allowed. I don't think that's a very good solution. It's perfectly sensible for someone to want to do WHERE/FILTER random() < 0.01 to load a smattering of rows, and this would rule that out for no very good reason. I think it would be fine to just document that if the filter condition examines the state of the database, it will not see the results of the COPY which is in progress. I'm pretty sure there are other cases - for example with triggers - where you can get code to run that can't see the results of the command currently in progress, so I don't really buy the idea that having a feature which works that way is categorically unacceptable. I agree that we can't justify flagrantly wrong behavior on the grounds that correct behavior is expensive or on the grounds that the incorrect cases will be rare. However, when there's more than one sensible behavior, it's not unreasonable to pick one that is easier to implement or cheaper or whatever, and that's the category into which I would put this decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: zheap: a new storage format for PostgreSQL
On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule wrote: > I have a problem to imagine it. When fill factor will be low, then there is > high risk of high fragmentation - or there some body should to do > defragmentation. I don't understand this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: zheap: a new storage format for PostgreSQL
čt 6. 12. 2018 v 16:12 odesílatel Robert Haas napsal: > On Thu, Dec 6, 2018 at 2:11 AM Pavel Stehule > wrote: > >> > I am sorry, I know zero about zheap - does zheap use fill factor? if > yes, why? > >> > >> Good question. It is required because tuples can expand (Update tuple > >> to bigger length). In such cases, we try to perform in-place update > >> if there is a space in the page. So, having fillfactor can help. > > > > Thank you for reply :) > > I suspect fillfactor is *more* likely to help with zheap than with the > current heap. With the current heap, you need to leave enough space > to store entire copies of the tuples to try to get HOT updates. But > with zheap you only need enough room for the anticipate growth in the > tuples. > > For instance, let's say that you plan to update 30% of the tuples in a > table and make them 1 byte larger. With the heap, you'd need to leave > ~ 3/13 = 23% of each page empty, plus a little bit more to allow for > the storage growth. So to make all of those updates HOT, you would > probably need a fillfactor of roughly 75%. Unfortunately, that will > make your table larger by one-third, which is terrible. > > On the other hand, with zheap, you only need to leave enough room for > the increased amount of tuple data. If you've got 121 items per page, > as in Mithun's statistics, that means you need 121 bytes of free space > to do all the updates in place. That means you need a fillfactor of 1 > - (121/8192) = ~98%. To be conservative you can set a fillfactor of > say 95%. Your table will only get slightly bigger, and all of your > updates will be in place, and everything will be great. At least with > respect to fillfactor -- zheap is not free of other problems. > I have a problem to imagine it. When fill factor will be low, then there is high risk of high fragmentation - or there some body should to do defragmentation. > Of course, you don't really set fillfactor based on an expectation of > a single round of tuple updates, but imagine that the workload goes on > for a while, with tuples getting bigger and smaller again as the exact > values being stored change. In a heap table, you need LOTS of empty > space on each page to get HOT updates. In a zheap table, you need > very little, because the updates are in place. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: \gexec \watch
On Thu, Dec 06, 2018 at 05:01:26AM -0300, Alvaro Herrera wrote: > On 2018-Dec-06, David Fetter wrote: > > > There's a bit of a philosophical issue here, or a mathematical one, > > whichever way you want to put it. Does it actually make sense to have > > the behavior of one "semicolon" spill onto another? > > Honestly, I don't see the mathematicality in this. It either works, or > it doesn't -- and from my POV right now it doesn't. Are you saying we > need a \gexecwatch for this to work? We could call it something a little shorter, but yes. The part you trimmed away had descriptions of why the current behavior is correct. We don't really have ways to compose \ operators. If we're going to add composition to the psql language, we should think it through carefully, not just glom it on for a single special case. > I can of course solve my problem with a simple python program, but psql > is so close ... Agreed! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: zheap: a new storage format for PostgreSQL
On Thu, Dec 6, 2018 at 2:11 AM Pavel Stehule wrote: >> > I am sorry, I know zero about zheap - does zheap use fill factor? if yes, >> > why? >> >> Good question. It is required because tuples can expand (Update tuple >> to bigger length). In such cases, we try to perform in-place update >> if there is a space in the page. So, having fillfactor can help. > > Thank you for reply :) I suspect fillfactor is *more* likely to help with zheap than with the current heap. With the current heap, you need to leave enough space to store entire copies of the tuples to try to get HOT updates. But with zheap you only need enough room for the anticipate growth in the tuples. For instance, let's say that you plan to update 30% of the tuples in a table and make them 1 byte larger. With the heap, you'd need to leave ~ 3/13 = 23% of each page empty, plus a little bit more to allow for the storage growth. So to make all of those updates HOT, you would probably need a fillfactor of roughly 75%. Unfortunately, that will make your table larger by one-third, which is terrible. On the other hand, with zheap, you only need to leave enough room for the increased amount of tuple data. If you've got 121 items per page, as in Mithun's statistics, that means you need 121 bytes of free space to do all the updates in place. That means you need a fillfactor of 1 - (121/8192) = ~98%. To be conservative you can set a fillfactor of say 95%. Your table will only get slightly bigger, and all of your updates will be in place, and everything will be great. At least with respect to fillfactor -- zheap is not free of other problems. Of course, you don't really set fillfactor based on an expectation of a single round of tuple updates, but imagine that the workload goes on for a while, with tuples getting bigger and smaller again as the exact values being stored change. In a heap table, you need LOTS of empty space on each page to get HOT updates. In a zheap table, you need very little, because the updates are in place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mutability of domain CHECK constraints
ALTER DOMAIN ADD CONSTRAINT goes to some effort to verify that existing stored data of the domain type meets the new constraint. (It's not bulletproof, because it can't see uncommitted data, but at least it tries.) However, what if the user tries to change the behavior of an existing constraint clause? Nothing, of course, since we have no idea that anything has changed. This issue occurred to me while thinking about this buglet: regression=# create function sqlcheck(int) returns bool as regression-# 'select $1 > 0' language sql; CREATE FUNCTION regression=# create domain checkedint as int check(sqlcheck(value)); CREATE DOMAIN regression=# select 1::checkedint; -- ok checkedint 1 (1 row) regression=# select 0::checkedint; -- fail ERROR: value for domain checkedint violates check constraint "checkedint_check" regression=# create or replace function sqlcheck(int) returns bool as 'select $1 <= 0' language sql; CREATE FUNCTION regression=# select 1::checkedint; -- fail? checkedint 1 (1 row) regression=# select 0::checkedint; -- ok? ERROR: value for domain checkedint violates check constraint "checkedint_check" The reason this isn't behaving as-expected is that typcache.c has cached a version of the domain's check constraint that sqlcheck() has been inlined into, so the old behavior continues to apply until something happens to cause the typcache entry to be flushed. I'd started to work on some code changes to make the typcache react more promptly, but then it occurred to me that the example is really dubious anyway because any stored data of the domain type won't be rechecked. And fixing *that* seems entirely impractical. So what I'm thinking we should do is document that the behavior of a domain CHECK constraint is expected to be immutable, and it's on the user's head to preserve consistency if it isn't. We could recommend that any attempt to change a constraint's behavior be implemented by dropping and re-adding the constraint, which is a case that the system does know what to do with. Actually, the same goes for table CHECK constraints ... Thoughts? regards, tom lane
Re: \gexec \watch
Hi Álvaro, > On 6. Dec 2018, at 13:56, Alvaro Herrera wrote: > > To Oleksii's question, I think if you want to repeat the first query > over and over, you'd use something like this: > > select format('select now() as execution_time, %L as generation_time', now()) > as query \gset > :query \watch Nice one, although it only works if the original query outputs a single row (because of \gset). I do agree it’s not that useful to reuse the original query instead of executing it anew each time. Cheers, Oleksii
Re: \gexec \watch
Alvaro Herrera wrote: > Hmm, thanks. AFAICS your hack reexecutes the initial query over and > over, instead of obtaining a fresh query each time. I see. That hack is about injecting something programmatically into the query buffer, but it seems you'd need to do that in a loop. And if psql had a loop construct you wouldn't need a hack in the first place I guess! Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: \gexec \watch
On 2018-Dec-06, Daniel Verite wrote: > I think you could achieve more or less the result with a pre-gexec > hack like that: Hmm, thanks. AFAICS your hack reexecutes the initial query over and over, instead of obtaining a fresh query each time. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: \gexec \watch
Alvaro Herrera wrote: > Honestly, I don't see the mathematicality in this. It either works, or > it doesn't -- and from my POV right now it doesn't. Are you saying we > need a \gexecwatch for this to work? > > I can of course solve my problem with a simple python program, but psql > is so close ... \watch reexecutes what's in the query buffer, and \gexec does not write into the query buffer, so the desired piping does not happen by design. I think you could achieve more or less the result with a pre-gexec hack like that: postgres=# \pset tuples_only on postgres=# select 'select now();' \g /tmp/file.sql postgres=# \setenv EDITOR touch postgres=# \e /tmp/file.sql 2018-12-06 13:54:24.915752+01 postgres=# \watch 2018-12-06 13:54:42.366559+01 2018-12-06 13:54:44.368962+01 2018-12-06 13:54:46.3713+01 The "\setenv EDITOR touch" kludge is meant to force \e to inject the contents of /tmp/file.sql into the query buffer. It's needed because "\e file" actually checks whether the file has been modified (per mtime) after $EDITOR returns, and discards it if it hasn't. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: \gexec \watch
čt 6. 12. 2018 v 13:56 odesílatel Alvaro Herrera napsal: > On 2018-Dec-06, Pavel Stehule wrote: > > > čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin > > napsal: > > > > The other question is whether such a command would execute the original > > > query every time watch is invoked. Consider, e.g. the following one: > > > > > > select format('select now() as execution_time, %L as generation_time', > > > now()) \gexec > > > execution_time | 2018-12-06 12:15:24.136086+01 > > > generation_time | 2018-12-06 12:15:24.13577+01 > > > > > > If we make \gexec + \watch combination re-execute only the output of > the > > > original query (without the query itself), then the generation time > column > > > will stay constant through all \watch invocations. > > > > It is better to introduce new command like \gexec_repeat with units like > > 5s, or how much 5x - > > It is? \gexec \watch is an elegant construct using two existing atoms > with well-defined semantics. Can't say I see that in \gexec_repeat -- > it seems non-orthogonal to me. > Maybe I am wrong, but currently is not possible to compose \ commands. So you should to introduce new pattern. There is enough long command buffer to implement it. Regards Pavel > To Oleksii's question, I think if you want to repeat the first query > over and over, you'd use something like this: > > select format('select now() as execution_time, %L as generation_time', > now()) as query \gset > :query \watch > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: \gexec \watch
On 2018-Dec-06, Pavel Stehule wrote: > čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin > napsal: > > The other question is whether such a command would execute the original > > query every time watch is invoked. Consider, e.g. the following one: > > > > select format('select now() as execution_time, %L as generation_time', > > now()) \gexec > > execution_time | 2018-12-06 12:15:24.136086+01 > > generation_time | 2018-12-06 12:15:24.13577+01 > > > > If we make \gexec + \watch combination re-execute only the output of the > > original query (without the query itself), then the generation time column > > will stay constant through all \watch invocations. > > It is better to introduce new command like \gexec_repeat with units like > 5s, or how much 5x - It is? \gexec \watch is an elegant construct using two existing atoms with well-defined semantics. Can't say I see that in \gexec_repeat -- it seems non-orthogonal to me. To Oleksii's question, I think if you want to repeat the first query over and over, you'd use something like this: select format('select now() as execution_time, %L as generation_time', now()) as query \gset :query \watch -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: \gexec \watch
Hi Oleksii On 2018-Dec-06, Oleksii Kliukin wrote: > The other question is whether such a command would execute the > original query every time watch is invoked. Consider, e.g. the > following one: > > select format('select now() as execution_time, %L as generation_time', now()) > \gexec > execution_time | 2018-12-06 12:15:24.136086+01 > generation_time | 2018-12-06 12:15:24.13577+01 > > If we make \gexec + \watch combination re-execute only the output of > the original query (without the query itself), then the generation > time column will stay constant through all \watch invocations. Hmm, I think reusing the first query is not terribly useful. My example (thus far) is something like this select format('select tableoid::regclass, * from %s where ctid = ''(%s,%s)''', relation::regclass, page, tuple) from pg_locks where locktype = 'tuple' and pid in (select pid from pg_locks where granted = false and locktype = 'transactionid') and database = (select oid from pg_database where datname = current_database()) \gexec [\watch] which is supposed to report the current tuple-level conflicts (two updates concurrently in the same tuple, etc). I want to get the PK/replica identity[*] of all tuples that some backend is currently waiting for; if the query remains constant, it will return me the identity of the tuple located in the CTID of the tuples that conflicted in the first iteration, which is completely useless. [*] Right now it just reports all columns rather than PK ... I intend to add that bit next. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Hint and detail punctuation
On 2018-Dec-06, Daniel Gustafsson wrote: > > On 6 Dec 2018, at 05:46, Michael Paquier wrote: > > > > On Wed, Dec 05, 2018 at 05:22:25PM +0100, Daniel Gustafsson wrote: > >> While looking at error messages downstream, I noticed a few hints and > >> details > >> in postgres which aren’t punctuated as per the style guide. The attached > >> patch > >> fixes the ones where it seemed reasonable to end with a period. > > > > Good point. I am spotting a couple more places: > > src/backend/utils/misc/guc.c: > > Ah, for some reason I hadn’t thought about looking at the GUCs. I agree with > your findings, patch updated. LGTM. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: \gexec \watch
čt 6. 12. 2018 v 12:26 odesílatel Oleksii Kliukin napsal: > > > > On 6. Dec 2018, at 09:01, Alvaro Herrera > wrote: > > > > On 2018-Dec-06, David Fetter wrote: > > > >> There's a bit of a philosophical issue here, or a mathematical one, > >> whichever way you want to put it. Does it actually make sense to have > >> the behavior of one "semicolon" spill onto another? > > > > Honestly, I don't see the mathematicality in this. It either works, or > > it doesn't -- and from my POV right now it doesn't. Are you saying we > > need a \gexecwatch for this to work? > > I’ve been trying to do similar stuff with periodic execution of \gexec > (changing the tablespace of all tables in the given one and retrying, since > some of them could only get a lock on subsequent attempts) and generally > reverted to a bash loop outside of psql, but having it built-in would be > great. > > Perhaps a numeric argument to \gexec, i.e. \gexec 5 to re-execute the > output of a query every 5 seconds? > looks not intuitive :) > The other question is whether such a command would execute the original > query every time watch is invoked. Consider, e.g. the following one: > > select format('select now() as execution_time, %L as generation_time', > now()) \gexec > execution_time | 2018-12-06 12:15:24.136086+01 > generation_time | 2018-12-06 12:15:24.13577+01 > > If we make \gexec + \watch combination re-execute only the output of the > original query (without the query itself), then the generation time column > will stay constant through all \watch invocations. > It is better to introduce new command like \gexec_repeat with units like 5s, or how much 5x - Regards Pavel > > Cheers, > Oleksii >
Re: Hint and detail punctuation
On Thu, Dec 06, 2018 at 09:42:26AM +0100, Daniel Gustafsson wrote: > Ah, for some reason I hadn’t thought about looking at the GUCs. I agree with > your findings, patch updated. Thanks Daniel, that looks fine to me at quick glance. I'll try to get that committed tomorrow my time if there are no objections until then. -- Michael signature.asc Description: PGP signature
Re: \gexec \watch
> On 6. Dec 2018, at 09:01, Alvaro Herrera wrote: > > On 2018-Dec-06, David Fetter wrote: > >> There's a bit of a philosophical issue here, or a mathematical one, >> whichever way you want to put it. Does it actually make sense to have >> the behavior of one "semicolon" spill onto another? > > Honestly, I don't see the mathematicality in this. It either works, or > it doesn't -- and from my POV right now it doesn't. Are you saying we > need a \gexecwatch for this to work? I’ve been trying to do similar stuff with periodic execution of \gexec (changing the tablespace of all tables in the given one and retrying, since some of them could only get a lock on subsequent attempts) and generally reverted to a bash loop outside of psql, but having it built-in would be great. Perhaps a numeric argument to \gexec, i.e. \gexec 5 to re-execute the output of a query every 5 seconds? The other question is whether such a command would execute the original query every time watch is invoked. Consider, e.g. the following one: select format('select now() as execution_time, %L as generation_time', now()) \gexec execution_time | 2018-12-06 12:15:24.136086+01 generation_time | 2018-12-06 12:15:24.13577+01 If we make \gexec + \watch combination re-execute only the output of the original query (without the query itself), then the generation time column will stay constant through all \watch invocations. Cheers, Oleksii
Re: Hint and detail punctuation
> On 6 Dec 2018, at 05:46, Michael Paquier wrote: > > On Wed, Dec 05, 2018 at 05:22:25PM +0100, Daniel Gustafsson wrote: >> While looking at error messages downstream, I noticed a few hints and details >> in postgres which aren’t punctuated as per the style guide. The attached >> patch >> fixes the ones where it seemed reasonable to end with a period. > > Good point. I am spotting a couple more places: > src/backend/utils/misc/guc.c: Ah, for some reason I hadn’t thought about looking at the GUCs. I agree with your findings, patch updated. cheers ./daniel errhint_punctuation-v2.patch Description: Binary data
Re: \gexec \watch
On 2018-Dec-06, David Fetter wrote: > There's a bit of a philosophical issue here, or a mathematical one, > whichever way you want to put it. Does it actually make sense to have > the behavior of one "semicolon" spill onto another? Honestly, I don't see the mathematicality in this. It either works, or it doesn't -- and from my POV right now it doesn't. Are you saying we need a \gexecwatch for this to work? I can of course solve my problem with a simple python program, but psql is so close ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services