Re: [HACKERS] Buildfarm not happy with test module move
On Mon, Dec 1, 2014 at 2:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: That was my fault -- the alvh.no-ip.org domain was deleted, and the email system in postgresql.org rejected the commit message because the sender was not in a deliverable domain. I noticed within a few hours, but the damage was already done. I guess I'm confused about which is your preferred email address? Yeah. I was assuming the alvh.no-ip.org was going to come back, as that's what you're still subcribed to critical mailinglists with :) Do you need that one changed? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm not happy with test module move
Magnus Hagander wrote: On Mon, Dec 1, 2014 at 2:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: That was my fault -- the alvh.no-ip.org domain was deleted, and the email system in postgresql.org rejected the commit message because the sender was not in a deliverable domain. I noticed within a few hours, but the damage was already done. I guess I'm confused about which is your preferred email address? Yeah. I was assuming the alvh.no-ip.org was going to come back, as that's what you're still subcribed to critical mailinglists with :) Do you need that one changed? My preferred email address continues to be alvhe...@alvh.no-ip.org. It was deleted by mistake, it wasn't intended. (If you must know, it's a new idiotic feature by no-ip.com, coupled by my server machine at home dying because of a stuck SSD, which I thought was non-critical so didn't replace in over half a year, coupled with a change in laptop; goes to show that the non-critical machine wasn't really non-critical after all, and that I'm juggling with way too many things.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] compress method for spgist - 2
Initial message: http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru Second version fixes a forgotten changes in pg_am. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ spgist_compress_method-2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
Hi all I've attached a revised patchset for GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime to address David Rowley's review notes. Thanks for the review, and for poking me to follow up. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 63edf0571961f2f2c7fd6a08b99e747ee39a3377 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 12 Sep 2014 12:41:35 +0800 Subject: [PATCH 1/2] Use GetSystemTimeAsFileTime directly in windows gettimeofday MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PostgreSQL was calling GetSystemTime followed by SystemTimeToFileTime in the win32 port gettimeofday function. This is not necessary and limits the reported precision to the 1ms granularity that the SYSTEMTIME struct can represent. By using GetSystemTimeAsFileTime we avoid unnecessary conversions and capture timestamps at 100ns granularity, which is then rounded to 1µs granularity for storage in a PostgreSQL timestamp. On most Windows systems this change will actually have no significant effect as the system timer tick is typically between 1ms and 15ms depending on what timer resolution currently running applications have requested. You can check this with clockres.exe from sysinternals. Despite the platform limiation this change still permits capture of finer timestamps where the system is capable of producing them and it gets rid of an unnecessary syscall. Future work may permit use of GetSystemTimePreciseAsFileTime on Windows 8 and Windows Server 2012 for higher resolution time capture. This call has the same interface as GetSystemTimeAsFileTime. --- src/port/gettimeofday.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index 75a9199..73ec406 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -44,16 +44,14 @@ int gettimeofday(struct timeval * tp, struct timezone * tzp) { FILETIME file_time; - SYSTEMTIME system_time; ULARGE_INTEGER ularge; - GetSystemTime(system_time); - SystemTimeToFileTime(system_time, file_time); + GetSystemTimeAsFileTime(file_time); ularge.LowPart = file_time.dwLowDateTime; ularge.HighPart = file_time.dwHighDateTime; tp-tv_sec = (long) ((ularge.QuadPart - epoch) / 1000L); - tp-tv_usec = (long) (system_time.wMilliseconds * 1000); + tp-tv_usec = (long) (((ularge.QuadPart - epoch) % 1000L) / 10); return 0; } -- 1.9.3 From c3c9f38379dca3f6f59520c5ceafce34ee8c8d90 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Thu, 18 Sep 2014 23:02:14 +0800 Subject: [PATCH 2/2] Use GetSystemTimePreciseAsFileTime when available This will cause PostgreSQL on Windows 8 or Windows Server 2012 to obtain high-resolution timestamps while allowing the same binaries to run without problems on older releases. --- src/backend/main/main.c | 6 ++ src/include/port.h | 2 ++ src/port/gettimeofday.c | 56 +++-- 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/src/backend/main/main.c b/src/backend/main/main.c index c51b391..73c30c5 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -260,6 +260,12 @@ startup_hacks(const char *progname) /* In case of general protection fault, don't show GUI popup box */ SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); + +#ifndef HAVE_GETTIMEOFDAY + /* Figure out which syscall to use to capture timestamp information */ + init_win32_gettimeofday(); +#endif + } #endif /* WIN32 */ diff --git a/src/include/port.h b/src/include/port.h index 94a0e2f..58677ec 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type); #ifndef HAVE_GETTIMEOFDAY /* Last parameter not used */ extern int gettimeofday(struct timeval * tp, struct timezone * tzp); +/* On windows we need to call some backend start setup for accurate timing */ +extern void init_win32_gettimeofday(void); #endif #else /* !WIN32 */ diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index 73ec406..a82a1a4 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -30,14 +30,66 @@ #include sys/time.h +#ifndef FRONTEND +#include utils/elog.h +#endif + /* FILETIME of Jan 1 1970 00:00:00. */ static const unsigned __int64 epoch = UINT64CONST(1164447360); /* + * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a + * signature, so we can just store a pointer to whichever we find. This + * is the pointer's type. + */ +typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME); +/* Storage for the function we pick at runtime */ +static PgGetSystemTimeFn pg_get_system_time = NULL; + +/* + * During backend startup, determine if GetSystemTimePreciseAsFileTime is + * available and use it; if not,
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
Il 01/12/14 14:16, Craig Ringer ha scritto: +#ifndef FRONTEND +#include utils/elog.h +#endif + I think this is a leftover, as you don't use elog afterwards. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Thu, Nov 27, 2014 at 11:43 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Nov 27, 2014 at 8:18 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Thu, Nov 27, 2014 at 1:07 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Nov 27, 2014 at 12:55 AM, Sawada Masahiko sawada.m...@gmail.com wrote: +1 to define new something object type and remove do_user and do_system. But if we add OBJECT_SYSTEM to ObjectType data type, system catalogs are OBJECT_SYSTEM as well as OBJECT_TABLE. It's a bit redundant? Yes, kind of. That's a superset of a type of relations, aka a set of catalog tables. If you find something cleaner to propose, feel free. I thought we can add new struct like ReindexObjectType which has REINDEX_OBJECT_TABLE, REINDEX_OBJECT_SYSTEM and so on. It's similar to GRANT syntax. Check. Another thing, ReindexDatabaseOrSchema should be renamed to ReindexObject. So, I think that we need to think a bit more here. We are not far from smth that could be committed, so marking as Waiting on Author for now. Thoughts? Is the table also kind of object? Sorry, I am not sure I follow you here. Indexes and tables have already their relkind set in ReindexStmt, and I think that we're fine to continue letting them go in their own reindex code path for now. It was not enough, sorry. I mean that there is already ReindexTable() function. if we renamed ReindexObject, I would feel uncomfortable feeling. Because table is also kind of object. Check. If you get that done, I'll have an extra look at it and then let's have a committer look at it. Attached patch is latest version. I added new enum values like REINDEX_OBJECT_XXX, and changed ReindexObject() function. Also ACL problem is fixed for this version. Regards, --- Sawada Masahiko 000_reindex_schema_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote: Hi all Currently the client must know the size of a large lob/clob field, like a 'bytea' or 'text' field, in order to send it to the server. This can force the client to buffer all the data before sending it to the server. Yes, this is not good. It would be helpful if the v4 protocol permitted the client to specify the field length as unknown / TBD, then stream data until an end marker is read. What's wrong with specifying its length in advance instead? Are you thinking of a one or more use cases where it's both large and unknown? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
On 12/01/2014 10:38 PM, David Fetter wrote: On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote: Hi all Currently the client must know the size of a large lob/clob field, like a 'bytea' or 'text' field, in order to send it to the server. This can force the client to buffer all the data before sending it to the server. Yes, this is not good. It would be helpful if the v4 protocol permitted the client to specify the field length as unknown / TBD, then stream data until an end marker is read. What's wrong with specifying its length in advance instead? Are you thinking of a one or more use cases where it's both large and unknown? I am - specifically, the JDBC setBlob(...) and setClob(...) APIs that accept streams without a specified length: https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setBlob(int,%20java.io.InputStream) https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setClob(int,%20java.io.Reader) There are variants that do take a length, so PgJDBC can (and now does) implement the no-length variants by internally buffering the stream until EOF. It'd be nice to get rid of that though. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
Craig Ringer cr...@2ndquadrant.com writes: Currently the client must know the size of a large lob/clob field, like a 'bytea' or 'text' field, in order to send it to the server. This can force the client to buffer all the data before sending it to the server. It would be helpful if the v4 protocol permitted the client to specify the field length as unknown / TBD, then stream data until an end marker is read. Some encoding would be required for binary data to ensure that occurrences of the end marker in the streamed data were properly handled, but there are many well established schemes for doing this. I think this is pretty much a non-starter as stated, because the v3 protocol requires all messages to have a preceding length word. That's not very negotiable. What's already on the TODO list is to allow large field values to be sent or received in segments, perhaps with a cursor-like arrangement. You can do that today for blobs, but not for oversize regular table fields. Of course, considering that the maximum practical size of a regular field is probably in the dozens of megabytes, and that RAM is getting cheaper all the time, it's not clear that it's all that much of a hardship for clients to buffer the whole thing. If we've not gotten around to this in the last dozen years, it's unlikely we'll get to it in the future either ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
I wrote: I spent a fair chunk of the weekend hacking on this patch to make it more understandable and fix up a lot of what seemed to me pretty clear arithmetic errors in the upper layers of the patch. However, I couldn't quite convince myself to commit it, because the business around estimation for partial histogram-bucket matches still doesn't make any sense to me. Actually, there's a second large problem with this patch: blindly iterating through all combinations of MCV and histogram entries makes the runtime O(N^2) in the statistics target. I made up some test data (by scanning my mail logs) and observed the following planning times, as reported by EXPLAIN ANALYZE: explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip; explain analyze select * from relays r1, relays r2 where r1.ip r2.ip; stats targeteqjoinsel networkjoinsel 100 0.27 ms 1.85 ms 10002.56 ms 167.2 ms 1 56.6 ms 13987.1 ms I don't think it's necessary for network selectivity to be quite as fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning time for a query that might need just milliseconds to execute :-( It seemed to me that it might be possible to reduce the runtime by exploiting knowledge about the ordering of the histograms, but I don't have time to pursue that right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm not happy with test module move
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: All of the MSVC critters are failing at make check. Yeah, I noticed that, thanks. As far as I can see the only way to fix it is to install dummy_seclabel to run the core seclabel test. That doesn't seem smart; I think it'd be better to remove that part of the core seclabel test, and move the rest of the test to a new test in the dummy_seclabel module. Sounds good to me. The other parts of the core tests that depend on contrib modules aren't exactly good models to follow. Pushed; tests pass for me, let's see what buildfarm says. I think the input/ and output/ stuff is rather annoying. I tried to make dummy_seclabel an extension instead of a bare library, so that CREATE EXTENSION inside the .sql loads it easily. The problem there is that dummy_seclabel doesn't have any sql-callable function, so the module doesn't ever load. I guess we could create a dummy function which is there only to cause the module to get loaded ... I am in touch with Andrew about adding a new stage to buildfarm client so that animals will build src/test/modules by default. It should work fine everywhere except MSVC, I think. The issue with MSVC continues to be that Install.pm and Mkvcbuild.pm need to be in sync regarding what to build and what to install. I could use help from some MSVC-enabled developer there ... In the meantime, I'm going to rebase commit_ts to src/test/modules and get one bug I found there fixed, and will push that shortly too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm not happy with test module move
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Sounds good to me. The other parts of the core tests that depend on contrib modules aren't exactly good models to follow. Pushed; tests pass for me, let's see what buildfarm says. Pushed? Don't see it ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using Core Foundation locale functions
On Nov 28, 2014, at 8:43 AM, Peter Eisentraut pete...@gmx.net wrote: At the moment, this is probably just an experiment that shows where refactoring and better abstractions might be suitable if we want to support multiple locale libraries. If we want to pursue ICU, I think this could be a useful third option. Gotta say, I’m thrilled to see movement on this front, and especially pleased to see how consensus seems to be building around an abstracted interface to keep options open. This platform-specific example really highlights the need for it (I had no idea that there was separate and more up-to-date collation support in Core Foundation than in the UNIX layer of OS X). Really looking forward to seeing where we end up. Best, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Turning recovery.conf into GUCs
Alex Shulgin a...@commandprompt.com writes: Here's an attempt to revive this patch. Here's the patch rebased against current HEAD, that is including the recently committed action_at_recovery_target option. The default for the new GUC is 'pause', as in HEAD, and pause_at_recovery_target is removed completely in favor of it. I've also taken the liberty to remove that part that errors out when finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) -- Alex recovery_guc_v5.4.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 12/1/14, 11:57 AM, Andres Freund wrote: On 2014-11-30 20:46:51 -0600, Jim Nasby wrote: On 11/10/14, 7:52 PM, Tom Lane wrote: On the whole, I'm +1 for just logging the events and seeing what we learn that way. That seems like an appropriate amount of effort for finding out whether there is really an issue. Attached is a patch that does this. Unless somebody protests I plan to push this soon. I'll change two things: * Always use the same message, independent of scan_all. For one Jim's version would be untranslatable, for another it's not actually correct. In most cases we'll *not* wait, even if scan_all is true as we'll often just balk after !lazy_check_needs_freeze(). Good point. * Change the new bit in the errdetail. could not acquire cleanup lock sounds too much like an error to me. skipped %u pinned pages maybe? Seems reasonable. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No documentation on record *= record?
There doesn't seem to be documentation on *= (or search isn't finding it). Is this intentional? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 recovery/startup problems with unlogged tables
On Wed, Nov 26, 2014 at 4:13 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 25, 2014 at 9:30 PM, Jeff Janes jeff.ja...@gmail.com wrote: Using both 9.2.9 and 9_2_STABLE 9b468bcec15f1ea7433d4, I get a fairly reproducible startup failure. What I was doing is restore a database from a base backup and roll it forward with a recovery.conf until it completes and starts up. Then I truncate an unlogged table and start repopulating it with a large slow 'insert into ...select from' query. While that is running, if I reboot the server, when it restarts it does not come back up initially. This is what I get in the log from the attempted restart: PST LOG: database system was interrupted; last known up at 2014-11-25 15:40:33 PST PST LOG: database system was not properly shut down; automatic recovery in progress PST LOG: redo starts at 84/EF80 PST LOG: record with zero length at 84/EF09AE18 PST LOG: redo done at 84/EF09AD28 PST LOG: last completed transaction was at log time 2014-11-25 15:42:09.173599-08 PST LOG: checkpoint starting: end-of-recovery immediate PST LOG: checkpoint complete: wrote 103 buffers (0.2%); 0 transaction log file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s, total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s PST FATAL: could not create file base/16416/59288: File exists PST LOG: startup process (PID 2472) exited with exit code 1 PST LOG: aborting startup due to startup process failure oid2name doesn't show me any 59288, so I think it is the new copy of the unlogged table which is being created at the moment of the reboot. 59288 is not the new (uncommitted) unlogged table itself, but is the new (uncommitted) toast table associated with that unlogged table. immediately before the 'sudo /sbin/reboot', while the 'truncate unlogged_table; insert into unlogged_table from select...' is running, the two files below exist and have zero size. base/16416/59288_init base/16416/59288 Immediately after the system has come back up, the file base/16416/59288_init no longer exists. I can't reproduce this behavior without the reboot. It seems to depend on the sequencing and timing of the signals which the kernel delivers to the running processes during the reboot. If I do a pg_ctl stop -mf, then both files go away. If I do a pg_ctl stop -mi, then neither goes away. It is only with the /sbin/reboot that I get the fatal combination of _init being gone but the other still present. Then once the system is back and I restart postmaster, the first pass through ResetUnloggedRelationsInDbspaceDir doesn't remove base/16416/59288, because base/16416/59288_init doesn't exist to trigger the deletion. On the 2nd pass through, base/16416/59288_init exists because it was created by WAL replay, and the copy fails because it is expecting base/16416/59288 to have already been cleaned up by the first pass. Should ResetUnloggedRelationsInDbspaceDir on the second pass attempt to unlink the target immediately before the copy (line 338, in 9.2) to accommodate cases like this? I still haven't figured out what sequence of signals causes the init fork to be removed but the main fork to remain, but I think the startup process should be robust to this situation. The attached patch adds an unconditional and unchecked unlink before the copy. It should probably have a comment, but I don't really know what to say. This resolves the situation for me. It is expected that the unlink will usually fail with non-existent files in the vast majority of cases. Should it check that the unlink either succeeds, or fails for non-existence, and throw an error in cases other than those? Would it be a performance concern to call unlink potentially thousands of times? I don't think so, because it only calls the unlink when it is about to copy a file anyway, which is a much heavier operation. My understanding of the reason that it was done in two passes in the first place is that during hot standby, it is OK for the main fork to be missing, but not OK for it to be gibberish. But then upon promotion to writable, it is no longer OK for it to be missing, so they get copied over at that time (and not earlier as then init forks created during replay wouldn't get copied). Given that reasoning, I think that unconditionally unlinking the destination file before the copy would not be a problem. The gibberish master fork would not be a problem during hot standby queries as the tables creation has not been committed and so can't be seen. (I've added Robert to CC because he wrote the original code.) Cheers, Jeff reinit_unlink.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm not happy with test module move
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Sounds good to me. The other parts of the core tests that depend on contrib modules aren't exactly good models to follow. Pushed; tests pass for me, let's see what buildfarm says. Pushed? Don't see it ... Meh. Done for real this time. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: Please let me know what you think, all feedback is greatly appreciated. Thanks for this. Found time to do more of a review and have a few comments: diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c new file mode 100644 index d30612c..93eb2e6 *** a/src/backend/catalog/aclchk.c --- b/src/backend/catalog/aclchk.c *** pg_extension_ownercheck(Oid ext_oid, Oid *** 5051,5056 --- 5031,5058 } /* + * Check whether the specified role has a specific role attribute. + */ + bool + role_has_attribute(Oid roleid, RoleAttr attribute) + { + RoleAttrattributes; + HeapTuple tuple; + + tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); + + if (!HeapTupleIsValid(tuple)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(role with OID %u does not exist, roleid))); + + attributes = ((Form_pg_authid) GETSTRUCT(tuple))-rolattr; + ReleaseSysCache(tuple); + + return ((attributes attribute) 0); + } I'd put the superuser_arg() check in role_has_attribute. --- 5066,5089 bool has_createrole_privilege(Oid roleid) { /* Superusers bypass all permission checking. */ if (superuser_arg(roleid)) return true; ! return role_has_attribute(roleid, ROLE_ATTR_CREATEROLE); } And then ditch the individual has_X_privilege() calls (I note that you didn't appear to add back the has_rolcatupdate() function..). diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c new file mode 100644 index 1a73fd8..72c5dcc *** a/src/backend/commands/user.c --- b/src/backend/commands/user.c *** have_createrole_privilege(void) *** 63,68 --- 63,73 return has_createrole_privilege(GetUserId()); } + static RoleAttr + set_role_attribute(RoleAttr attributes, RoleAttr attribute) + { + return ((attributes ~(0x)) | attribute); + } I don't think this is doing what you think it is..? It certainly isn't working as expected by AlterRole(). Rather than using a function for these relatively simple operations, why not just have AlterRole() do: if (isX = 0) attributes = (isX 0) ? attributes | ROLE_ATTR_X : attributes ~(ROLE_ATTR_X); Otherwise, you'd probably need to pass a flag into set_role_attribute() to indicate if you're setting or unsetting the bit, or have an 'unset_role_attribute()' function, but all of that seems like overkill.. *** CreateRole(CreateRoleStmt *stmt) --- 86,99 char *password = NULL;/* user password */ boolencrypt_password = Password_encryption; /* encrypt password? */ charencrypted_password[MD5_PASSWD_LEN + 1]; ! boolissuper = false;/* Make the user a superuser? */ ! boolinherit = true; /* Auto inherit privileges? */ I'd probably leave the whitespace-only changes out. *** AlterRoleSet(AlterRoleSetStmt *stmt) *** 889,895 * To mess with a superuser you gotta be superuser; else you need * createrole, or just want to change your own settings */ ! if (((Form_pg_authid) GETSTRUCT(roletuple))-rolsuper) { if (!superuser()) ereport(ERROR, --- 921,928 * To mess with a superuser you gotta be superuser; else you need * createrole, or just want to change your own settings */ ! attributes = ((Form_pg_authid) GETSTRUCT(roletuple))-rolattr; ! if ((attributes ROLE_ATTR_SUPERUSER) 0) { if (!superuser()) ereport(ERROR, We don't bother with the ' 0' in any of the existing bit testing that goes on in the backend, so I don't think it makes sense to start now. Just do if (attributes ROLE_ATTR_SUPERUSER) ... etc and be done with it. *** aclitemin(PG_FUNCTION_ARGS) *** 577,582 --- 578,584 PG_RETURN_ACLITEM_P(aip); } + /* * aclitemout * Allocates storage for, and fills in, a new null-delimited string More whitespace changes... :/ *** pg_role_aclcheck(Oid role_oid, Oid rolei *** 4602,4607 --- 4604,4712 return ACLCHECK_NO_PRIV; } + /* + * has_role_attribute_id + * Check the named role attribute on a role by given role oid. + */ + Datum + has_role_attribute_id(PG_FUNCTION_ARGS) + { + Oid roleoid = PG_GETARG_OID(0); + text *attr_type_text = PG_GETARG_TEXT_P(1); + RoleAttr
Re: [HACKERS] bug in json_to_record with arrays
On 11/28/2014 12:55 PM, Tom Lane wrote: * This would only really address Josh's complaint if we were to back-patch it into 9.4, but I'm hesitant to change error texts post-RC1. Thoughts? If we don't fix an error text in an RC, what would we fix in an RC? That's as minor as things get. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
Josh Berkus j...@agliodbs.com writes: On 11/28/2014 12:55 PM, Tom Lane wrote: * This would only really address Josh's complaint if we were to back-patch it into 9.4, but I'm hesitant to change error texts post-RC1. Thoughts? If we don't fix an error text in an RC, what would we fix in an RC? That's as minor as things get. Code-wise, yeah, but it would put some pressure on the translators. What did you think of the new error texts in themselves? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
On 12/01/2014 12:28 PM, Josh Berkus wrote: On 11/28/2014 12:55 PM, Tom Lane wrote: * This would only really address Josh's complaint if we were to back-patch it into 9.4, but I'm hesitant to change error texts post-RC1. Thoughts? If we don't fix an error text in an RC, what would we fix in an RC? That's as minor as things get. I think the idea is that you only fix *major* things in an RC? That said, I agree that we should fix something so minor. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
On 2014-12-01 15:30:06 -0500, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 11/28/2014 12:55 PM, Tom Lane wrote: * This would only really address Josh's complaint if we were to back-patch it into 9.4, but I'm hesitant to change error texts post-RC1. Thoughts? If we don't fix an error text in an RC, what would we fix in an RC? That's as minor as things get. Code-wise, yeah, but it would put some pressure on the translators. I think a untranslated, but on the point, error message is better than a translated confusing one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
On 12/01/2014 12:30 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 11/28/2014 12:55 PM, Tom Lane wrote: * This would only really address Josh's complaint if we were to back-patch it into 9.4, but I'm hesitant to change error texts post-RC1. Thoughts? If we don't fix an error text in an RC, what would we fix in an RC? That's as minor as things get. Code-wise, yeah, but it would put some pressure on the translators. What did you think of the new error texts in themselves? I would prefer invalid input syntax to malformed array literal, but I'll take anything we can backpatch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
Josh Berkus j...@agliodbs.com writes: On 12/01/2014 12:30 PM, Tom Lane wrote: Code-wise, yeah, but it would put some pressure on the translators. What did you think of the new error texts in themselves? I would prefer invalid input syntax to malformed array literal, but I'll take anything we can backpatch. I think if we're changing them at all, it's about the same cost no matter what we change them to exactly. I'd be good with going to invalid input syntax if we also change record_in to match. Anyone have a feeling pro or con about that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
On Mon, Dec 1, 2014 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 12/01/2014 12:30 PM, Tom Lane wrote: Code-wise, yeah, but it would put some pressure on the translators. What did you think of the new error texts in themselves? I would prefer invalid input syntax to malformed array literal, but I'll take anything we can backpatch. I think if we're changing them at all, it's about the same cost no matter what we change them to exactly. I'd be good with going to invalid input syntax if we also change record_in to match. Anyone have a feeling pro or con about that? I don't know. malformed array literal is a mighty big clue that you have a bad postgres format array literal and will be well supported by googling -- this problem isn't new. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
Hi all, while working on the patch decreasing amount of memory consumed by array_agg [1], I've ran into some strange OOM issues. Reproducing them using the attached SQL script is rather simple. [1] https://commitfest.postgresql.org/action/patch_view?id=1652 At first I thought there's some rare hidden memory leak, but I'm pretty sure that's not the case. I've even put some explicit counters into aset.c counting allocated/freed blocks, and it seems to be working just fine (and matching the context tree perfectly). So no memory leak. The only thing I can think of is some interaction (fragmentation?) with the system allocator (I'm using glibc-2.19 and kernel 3.16.1, btw), making the RSS values even less useful than I thought. Sadly, it seems to trigger the OOM killer :-( It's entirely possible that this is a known behavior of the allocator, and I've been unaware of it. It's also true that the work_mem values are really excessive, and the actual values would be much lower, which would make the issues much less serious. To demonstrate the problem I'll use the attached SQL script - it's a rather simple script generating sample tables and then executing trivial array_agg() queries. The script sets work_mem to two different values: work_mem = '1GB'== this limits the INSERT (generating data) work_mem = '1024GB' == bogus value, forcing a hash aggegate (assuming the hash table fits into memory) The size of the sample tables (and the amount of memory needed for the hash aggregate) is determined by the fist parameter set in the script. With this value \set size 1 I get a OOM crash on the first execution of the SQL script (on a machine with 8GB of RAM and 512MB shared buffers), but YMMV. The problem is, that even with a much smaller dataset (say, using size 7500) you'll get an OOM error after several executions of the script. How many executions are needed seems to be inversely proportional to the size of the data set. The problem is that the RSS amount is increasing over time for some reason. For example with the size = 5000, the memory stats for the process look like this over the first few minutes: VIRTRESSHR %CPU %MEM TIME+ COMMAND 5045508 2,818g 187220 51,9 36,6 0:15.39 postgres: INSERT 5045508 3,600g 214868 62,8 46,8 3:11.58 postgres: INSERT 5045508 3,771g 214868 50,9 49,0 3:40.03 postgres: INSERT 5045508 3,840g 214868 48,5 49,9 4:00.71 postgres: INSERT 5045508 3,978g 214880 51,5 51,7 4:40.73 postgres: INSERT 5045508 4,107g 214892 53,2 53,4 5:22.04 postgres: INSERT 5045508 4,297g 215276 53,9 55,9 6:22.63 postgres: INSERT ... Those are rows for the backend process, captured from top over time. How long the backend was running is in the TIME column. Each iteration takes ~30 seconds, so those lines represent approximately iterations 1, 6, 7, 8, 11, etc. Notice how the RSS value grows over time, and also notice that this is the INSERT, restricted by work_mem=1GB. So the memory consumption should be ~1.5GB, and MemoryContextStats(TopMemoryContext) collected at this point is consistent with that (see the mem-ctx.log). And then the value stabilizes at ~4,430g and stops growing. With size 7500 it however takes only ~20 iterations to reach the OOM issue, with a crash log like this: [10560.843547] Killed process 15862 (postgres) total-vm:7198260kB, anon-rss:6494136kB, file-rss:300436kB So, any ideas what might be the culprit here? As I said, this is clearly made worse by inappropriately high work_mem values, but I'm not sure it's completely harmless. Imagine for example long-running backends, executing complex queries with inaccurate estimates. That may easily result in using much more memory than the work_mem limit, and increasing the RSS value over time. regards Tomas array-agg.sql Description: application/sql TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used TopTransactionContext: 8192 total in 1 blocks; 7904 free (0 chunks); 288 used TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used Attopt cache: 8192 total in 1 blocks; 2704 free (0 chunks); 5488 used Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used MessageContext: 32768 total in 3 blocks; 9640 free (0 chunks); 23128 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used ExecutorState: 318758960 total in 40 blocks; 5879664 free (8 chunks);
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
On 12/01/2014 05:39 PM, Tomas Vondra wrote: Hi all, while working on the patch decreasing amount of memory consumed by array_agg [1], I've ran into some strange OOM issues. Reproducing them using the attached SQL script is rather simple. [1] https://commitfest.postgresql.org/action/patch_view?id=1652 At first I thought there's some rare hidden memory leak, but I'm pretty sure that's not the case. I've even put some explicit counters into aset.c counting allocated/freed blocks, and it seems to be working just fine (and matching the context tree perfectly). So no memory leak. Doesn't this line: TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used look pretty suspicious? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
On 2.12.2014 00:31, Andrew Dunstan wrote: On 12/01/2014 05:39 PM, Tomas Vondra wrote: Hi all, while working on the patch decreasing amount of memory consumed by array_agg [1], I've ran into some strange OOM issues. Reproducing them using the attached SQL script is rather simple. [1] https://commitfest.postgresql.org/action/patch_view?id=1652 At first I thought there's some rare hidden memory leak, but I'm pretty sure that's not the case. I've even put some explicit counters into aset.c counting allocated/freed blocks, and it seems to be working just fine (and matching the context tree perfectly). So no memory leak. Doesn't this line: TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used look pretty suspicious? It certainly looks a bit large, especially considering this is a fresh cluster with a single DB, containing a single table (created by user), no doubt about that. For comparison, this is a new backend: TopMemoryContext: 78128 total in 11 blocks; 8528 free (16 chunks); 69600 used OTOH, it's just 130MB, and the RSS values are ~6GB when the OOM hits. For the record, I only tweaked these settings in the config file: test=# select name, setting from pg_settings where source like '%configuration file%'; name| setting +--- checkpoint_segments| 32 DateStyle | ISO, DMY default_text_search_config | pg_catalog.simple dynamic_shared_memory_type | posix lc_messages| cs_CZ.UTF-8 lc_monetary| cs_CZ.UTF-8 lc_numeric | cs_CZ.UTF-8 lc_time| cs_CZ.UTF-8 log_timezone | Europe/Prague maintenance_work_mem | 524288 max_connections| 100 shared_buffers | 65536 TimeZone | Europe/Prague work_mem | 1048576 (14 rows) Some of those are set by the initdb, of course. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in json_to_record with arrays
Merlin Moncure mmonc...@gmail.com writes: On Mon, Dec 1, 2014 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be good with going to invalid input syntax if we also change record_in to match. Anyone have a feeling pro or con about that? I don't know. malformed array literal is a mighty big clue that you have a bad postgres format array literal and will be well supported by googling -- this problem isn't new. Good point about googling, and after all we are already using malformed array literal for a subset of these cases. Let's stick with that wording. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
Tomas Vondra t...@fuzzy.cz writes: On 2.12.2014 00:31, Andrew Dunstan wrote: Doesn't this line: TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used look pretty suspicious? It certainly looks a bit large, especially considering this is a fresh cluster with a single DB, containing a single table (created by user), no doubt about that. OTOH, it's just 130MB, and the RSS values are ~6GB when the OOM hits. Yeah, but what was that 130MB being used for? It's not generally considered good practice to put large or random stuff in TopMemoryContext --- arguably, any code doing so is broken already, because almost by definition such allocations won't be reclaimed on error. What I suspect you're looking at here is the detritus of creation of a huge number of memory contexts. mcxt.c keeps its own state about existing contents in TopMemoryContext. So, if we posit that those contexts weren't real small, there's certainly room to believe that there was some major memory bloat going on recently. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin a...@commandprompt.com wrote: Here's the patch rebased against current HEAD, that is including the recently committed action_at_recovery_target option. If this patch gets in, it gives a good argument to jump to 10.0 IMO. That's not a bad thing, only the cost of making recovery params as GUCs which is still a feature wanted. The default for the new GUC is 'pause', as in HEAD, and pause_at_recovery_target is removed completely in favor of it. Makes sense. Another idea that popped out was to rename this parameter as recovery_target_action as well, but that's not really something this patch should care about. I've also taken the liberty to remove that part that errors out when finding $PGDATA/recovery.conf. I am not in favor of this part. It may be better to let the users know that their old configuration is not valid anymore with an error. This patch cuts in the flesh with a huge axe, let's be sure that users do not ignore the side pain effects, or recovery.conf would be simply ignored and users would not be aware of that. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
On 2.12.2014 01:33, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 2.12.2014 00:31, Andrew Dunstan wrote: Doesn't this line: TopMemoryContext: 136614192 total in 16678 blocks; 136005936 free (500017 chunks); 608256 used look pretty suspicious? It certainly looks a bit large, especially considering this is a fresh cluster with a single DB, containing a single table (created by user), no doubt about that. OTOH, it's just 130MB, and the RSS values are ~6GB when the OOM hits. Yeah, but what was that 130MB being used for? It's not generally considered good practice to put large or random stuff in TopMemoryContext --- arguably, any code doing so is broken already, because almost by definition such allocations won't be reclaimed on error. What I suspect you're looking at here is the detritus of creation of a huge number of memory contexts. mcxt.c keeps its own state about existing contents in TopMemoryContext. So, if we posit that those contexts weren't real small, there's certainly room to believe that there was some major memory bloat going on recently. Aha! MemoryContextCreate allocates the memory for the new context from TopMemoryContext explicitly, so that it survives resets of the parent context. Is that what you had in mind by keeping state about existing contexts? That'd probably explain the TopMemoryContext size, because array_agg() creates separate context for each group. So if you have 1M groups, you have 1M contexts. Although I don's see how the size of those contexts would matter? Maybe we could move this info (list of child contexts) to the parent context somehow, so that it's freed when the context is destroyed? Aside from the regular blocks, of course. But maybe it's not worth the effort, because you can only have so many memory contexts created concurrently. For AllocSet it's at most RAM/1kB (because that's the minimum block size) before the OOM intervenes. And in this case the contexts have 8kB initial size, so the number of contexts is even lower. So maybe it's not worth the effort. Also, this explains the TopMemoryContext size, but not the RSS size (or am I missing something)? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
Tomas Vondra t...@fuzzy.cz writes: On 2.12.2014 01:33, Tom Lane wrote: What I suspect you're looking at here is the detritus of creation of a huge number of memory contexts. mcxt.c keeps its own state about existing contents in TopMemoryContext. So, if we posit that those contexts weren't real small, there's certainly room to believe that there was some major memory bloat going on recently. Aha! MemoryContextCreate allocates the memory for the new context from TopMemoryContext explicitly, so that it survives resets of the parent context. Is that what you had in mind by keeping state about existing contexts? Right. That'd probably explain the TopMemoryContext size, because array_agg() creates separate context for each group. So if you have 1M groups, you have 1M contexts. Although I don's see how the size of those contexts would matter? Well, if they're each 6K, that's your 6GB right there. Maybe we could move this info (list of child contexts) to the parent context somehow, so that it's freed when the context is destroyed? We intentionally didn't do that, because in many cases it'd result in parent contexts becoming nonempty when otherwise they'd never have anything actually in them. The idea was that such shell parent contexts should be cheap, requiring only a control block in TopMemoryContext and not an actual allocation arena. This idea of a million separate child contexts was never part of the design of course; we might need to rethink whether that's a good idea. Or maybe there need to be two different policies about where to put child control blocks. Also, this explains the TopMemoryContext size, but not the RSS size (or am I missing something)? Very possibly you're left with islands that prevent reclaiming very much of the peak RAM usage. It'd be hard to be sure without some sort of memory map, of course. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Stephen, Thanks for this. Found time to do more of a review and have a few comments: Thanks for taking a look at this and for the feedback. I'd put the superuser_arg() check in role_has_attribute. Ok. Though, this would affect how CATUPDATE is handled. Peter Eisentraut previously raised a question about whether superuser checks should be included with catupdate which led me to create the following post. http://www.postgresql.org/message-id/cakrt6cqovt2kiykg2gff7h9k8+jvu1149zlb0extkkk7taq...@mail.gmail.com Certainly, we could keep has_rolcatupdate for this case and put the superuser check in role_has_attribute, but it seems like it might be worth taking a look at whether a superuser can bypass catupdate or not. Just a thought. And then ditch the individual has_X_privilege() calls (I note that you didn't appear to add back the has_rolcatupdate() function..). Ok. I had originally thought for this patch that I would try to minimize these types of changes, though perhaps this is that opportunity previously mentioned in refactoring those. However, the catupdate question still remains. + static RoleAttr + set_role_attribute(RoleAttr attributes, RoleAttr attribute) + { + return ((attributes ~(0x)) | attribute); + } I don't think this is doing what you think it is..? It certainly isn't working as expected by AlterRole(). Rather than using a function for these relatively simple operations, why not just have AlterRole() do: if (isX = 0) attributes = (isX 0) ? attributes | ROLE_ATTR_X : attributes ~(ROLE_ATTR_X); Yes, this was originally my first approach. I'm not recollecting why I decided on the other route, but agree that is preferable and simpler. Otherwise, you'd probably need to pass a flag into set_role_attribute() to indicate if you're setting or unsetting the bit, or have an 'unset_role_attribute()' function, but all of that seems like overkill.. Agreed. We don't bother with the ' 0' in any of the existing bit testing that goes on in the backend, so I don't think it makes sense to start now. Just do if (attributes ROLE_ATTR_SUPERUSER) ... etc and be done with it. Ok, easy enough. Why not have this as 'pg_has_role_id_attribute' and then have a 'pg_has_role_name_attribute' also? Seems like going with the pg_ namespace is the direction we want to go in, though I'm not inclined to argue about it if folks prefer has_X. I have no reason for one over the other, though I did ask myself that question. I did find it curious that in some cases there is has_X and then in others pg_has_X. Perhaps I'm not looking in the right places, but I haven't found anything that helps to distinguish when one vs the other is appropriate (even if it is a general rule of thumb). Seems like you could just make temp_array be N_ROLE_ATTRIBUTES in size, instead of building a list, counting it, and then building the array from that..? Yes, this is true. Do we really need to keep has_rolinherit for any reason..? Seems unnecessary given it's only got one call site and it's nearly the same as a call to role_has_attribute() anyway. Ditto with has_rolreplication(). I really don't see any reason and as above, I can certainly do those refactors now if that is what is desired. Thought we were getting rid of this..? ! #define N_ROLE_ATTRIBUTES 8 /* 1 plus the last 1x */ ! #define ROLE_ATTR_NONE 0 ! #define ROLE_ATTR_ALL 255 /* All currently available attributes. */ Or: #define ROLE_ATTR_ALL (1N_ROLE_ATTRIBUTES)-1 Yes, we were, however the latter causes a syntax error with initdb. :-/ ! DATA(insert OID = 10 ( POSTGRES PGROLATTRALL -1 _null_ _null_)); #define BOOTSTRAP_SUPERUSERID 10 Is it actually necessary to do this substitution when the value is #define'd in the same .h file...? Might be something to check, if you haven't already. Yes, I had previously checked this, I get the following error from initdb. FATAL: invalid input syntax for integer: ROLE_ATTR_ALL + #define ROLE_ATTR_SUPERUSER (10) + #define ROLE_ATTR_INHERIT (11) + #define ROLE_ATTR_CREATEROLE(12) + #define ROLE_ATTR_CREATEDB (13) + #define ROLE_ATTR_CATUPDATE (14) + #define ROLE_ATTR_CANLOGIN (15) + #define ROLE_ATTR_REPLICATION (16) + #define ROLE_ATTR_BYPASSRLS (17) + #define N_ROLE_ATTRIBUTES 8 + #define ROLE_ATTR_NONE 0 These shouldn't need to be here any more..? No they shouldn't, not sure how I missed that. Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] tracking commit timestamps
Petr Jelinek wrote: On 25/11/14 16:30, Alvaro Herrera wrote: Petr Jelinek wrote: On 25/11/14 16:23, Alvaro Herrera wrote: Robert Haas wrote: Maybe 0 should get translated to a NULL return, instead of a bogus timestamp. That's one idea --- surely no transaction is going to commit at 00:00:00 on 2000-01-01 anymore. Yet this is somewhat discomforting. I solved it for xids that are out of range by returning -infinity and then changing that to NULL in sql interface, but no idea how to do that for aborted transactions. I guess the idea is that we just read the value from the slru and if it exactly matches allballs we do the same -infinity return and translation to NULL. (Do we really love this -infinity idea? If it's just an internal API we can use a boolean instead.) As in returning boolean instead of void as found? That works for me (for the C interface). Petr sent me privately some changes to implement this idea. I also reworked the tests so that they only happen on src/test/modules (getting rid of the one in core regress) and made them work with both enabled and disabled track_commit_timestamps; in make installcheck, they pass regardless of the setting of the installed server, and in make check, they run a server with the setting enabled. I made two more changes: 1. introduce newestCommitTs. Original code was using lastCommitXact to check that no future transaction is asked for, but this doesn't really work if a long-running transaction is committed, because asking for transactions with a higher Xid but which were committed earlier would raise an error. 2. change CommitTsControlLock to CommitTsLock as the lock that protects the stuff we keep in ShmemVariableCache. The Control one is for SLRU access, and so it might be slow at times. This is important because we fill the checkpoint struct from values protected by that lock, so a checkpoint might be delayed if it happens to land in the middle of a slru IO operation. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c index 3b8241b..f0a023f 100644 --- a/contrib/pg_upgrade/pg_upgrade.c +++ b/contrib/pg_upgrade/pg_upgrade.c @@ -423,8 +423,10 @@ copy_clog_xlog_xid(void) /* set the next transaction id and epoch of the new cluster */ prep_status(Setting next transaction ID and epoch for new cluster); exec_prog(UTILITY_LOG_FILE, NULL, true, - \%s/pg_resetxlog\ -f -x %u \%s\, - new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, + \%s/pg_resetxlog\ -f -x %u -c %u \%s\, + new_cluster.bindir, + old_cluster.controldata.chkpnt_nxtxid, + old_cluster.controldata.chkpnt_nxtxid, new_cluster.pgdata); exec_prog(UTILITY_LOG_FILE, NULL, true, \%s/pg_resetxlog\ -f -e %u \%s\, diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c index 9397198..180818d 100644 --- a/contrib/pg_xlogdump/rmgrdesc.c +++ b/contrib/pg_xlogdump/rmgrdesc.c @@ -10,6 +10,7 @@ #include access/brin_xlog.h #include access/clog.h +#include access/commit_ts.h #include access/gin.h #include access/gist_private.h #include access/hash.h diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ab8c263..e3713d3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2673,6 +2673,20 @@ include_dir 'conf.d' /listitem /varlistentry + varlistentry id=guc-track-commit-timestamp xreflabel=track_commit_timestamp + termvarnametrack_commit_timestamp/varname (typebool/type)/term + indexterm + primaryvarnametrack_commit_timestamp/ configuration parameter/primary + /indexterm + listitem + para +Record commit time of transactions. This parameter +can only be set in filenamepostgresql.conf/ file or on the server +command line. The default value is literaloff/literal. + /para + /listitem + /varlistentry + /variablelist /sect2 diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index baf81ee..62ec275 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15938,6 +15938,45 @@ SELECT collation for ('foo' COLLATE de_DE); For example literal10:20:10,14,15/literal means literalxmin=10, xmax=20, xip_list=10, 14, 15/literal. /para + + para +The functions shown in xref linkend=functions-commit-timestamp +provide information about transactions that have been already committed. +These functions mainly provide information about when the transactions +were committed. They only provide useful data when +xref linkend=guc-track-commit-timestamp configuration option is enabled +and only for transactions that were committed after it was enabled. + /para + + table id=functions-commit-timestamp +titleCommitted transaction information/title +tgroup cols=3 + thead +
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
On 12/01/2014 09:51 PM, Marco Nenciarini wrote: I think this is a leftover, as you don't use elog afterwards. Good catch, fixed. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 63edf0571961f2f2c7fd6a08b99e747ee39a3377 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Fri, 12 Sep 2014 12:41:35 +0800 Subject: [PATCH 1/2] Use GetSystemTimeAsFileTime directly in windows gettimeofday MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PostgreSQL was calling GetSystemTime followed by SystemTimeToFileTime in the win32 port gettimeofday function. This is not necessary and limits the reported precision to the 1ms granularity that the SYSTEMTIME struct can represent. By using GetSystemTimeAsFileTime we avoid unnecessary conversions and capture timestamps at 100ns granularity, which is then rounded to 1µs granularity for storage in a PostgreSQL timestamp. On most Windows systems this change will actually have no significant effect as the system timer tick is typically between 1ms and 15ms depending on what timer resolution currently running applications have requested. You can check this with clockres.exe from sysinternals. Despite the platform limiation this change still permits capture of finer timestamps where the system is capable of producing them and it gets rid of an unnecessary syscall. Future work may permit use of GetSystemTimePreciseAsFileTime on Windows 8 and Windows Server 2012 for higher resolution time capture. This call has the same interface as GetSystemTimeAsFileTime. --- src/port/gettimeofday.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index 75a9199..73ec406 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -44,16 +44,14 @@ int gettimeofday(struct timeval * tp, struct timezone * tzp) { FILETIME file_time; - SYSTEMTIME system_time; ULARGE_INTEGER ularge; - GetSystemTime(system_time); - SystemTimeToFileTime(system_time, file_time); + GetSystemTimeAsFileTime(file_time); ularge.LowPart = file_time.dwLowDateTime; ularge.HighPart = file_time.dwHighDateTime; tp-tv_sec = (long) ((ularge.QuadPart - epoch) / 1000L); - tp-tv_usec = (long) (system_time.wMilliseconds * 1000); + tp-tv_usec = (long) (((ularge.QuadPart - epoch) % 1000L) / 10); return 0; } -- 1.9.3 From 035bb3b19f93560d18834d2daf397c70e1555015 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Thu, 18 Sep 2014 23:02:14 +0800 Subject: [PATCH 2/2] Use GetSystemTimePreciseAsFileTime when available This will cause PostgreSQL on Windows 8 or Windows Server 2012 to obtain high-resolution timestamps while allowing the same binaries to run without problems on older releases. --- src/backend/main/main.c | 6 ++ src/include/port.h | 2 ++ src/port/gettimeofday.c | 52 +++-- 3 files changed, 58 insertions(+), 2 deletions(-) diff --git a/src/backend/main/main.c b/src/backend/main/main.c index c51b391..73c30c5 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -260,6 +260,12 @@ startup_hacks(const char *progname) /* In case of general protection fault, don't show GUI popup box */ SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); + +#ifndef HAVE_GETTIMEOFDAY + /* Figure out which syscall to use to capture timestamp information */ + init_win32_gettimeofday(); +#endif + } #endif /* WIN32 */ diff --git a/src/include/port.h b/src/include/port.h index 94a0e2f..58677ec 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type); #ifndef HAVE_GETTIMEOFDAY /* Last parameter not used */ extern int gettimeofday(struct timeval * tp, struct timezone * tzp); +/* On windows we need to call some backend start setup for accurate timing */ +extern void init_win32_gettimeofday(void); #endif #else /* !WIN32 */ diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index 73ec406..ab4f491 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -35,9 +35,57 @@ static const unsigned __int64 epoch = UINT64CONST(1164447360); /* + * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a + * signature, so we can just store a pointer to whichever we find. This + * is the pointer's type. + */ +typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME); +/* Storage for the function we pick at runtime */ +static PgGetSystemTimeFn pg_get_system_time = NULL; + +/* + * During backend startup, determine if GetSystemTimePreciseAsFileTime is + * available and use it; if not, fall back to GetSystemTimeAsFileTime. + */ +void +init_win32_gettimeofday(void) +{ + /* + * Because it's guaranteed that kernel32.dll will be linked into our + * address space already,
Re: [HACKERS] inherit support for foreign tables
(2014/11/28 18:14), Ashutosh Bapat wrote: On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: Apart from the above, I noticed that the patch doesn't consider to call ExplainForeignModify during EXPLAIN for an inherited UPDATE/DELETE, as shown below (note that there are no UPDATE remote queries displayed): So, I'd like to modify explain.c to show those queries like this: postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN --__--__- Update on public.parent (cost=0.00..280.77 rows=25 width=10) - Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=10) Output: (parent.a * 2), parent.ctid Filter: (parent.a = 5) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (ft1.a * 2), ft1.ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft2 (cost=100.00..140.38 rows=12 width=10) Output: (ft2.a * 2), ft2.ctid Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE (12 rows) Two remote SQL under a single node would be confusing. Also the node is labelled as Foreign Scan. It would be confusing to show an UPDATE command under this scan node. I thought this as an extention of the existing (ie, non-inherited) case (see the below example) to the inherited case. postgres=# explain verbose update ft1 set a = a * 2 where a = 5; QUERY PLAN - Update on public.ft1 (cost=100.00..140.38 rows=12 width=10) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (a * 2), ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE (5 rows) I think we should show update commands somewhere for the inherited case as for the non-inherited case. Alternatives to this are welcome. BTW, I was experimenting with DMLs being executed on multiple FDW server under same transaction and found that the transactions may not be atomic (and may be inconsistent), if one or more of the server fails to commit while rest of them commit the transaction. The reason for this is, we do not rollback the already committed changes to the foreign server, if one or more of them fail to commit a transaction. With foreign tables under inheritance hierarchy a single DML can span across multiple servers and the result may not be atomic (and may be inconsistent). So, IIUC, even the transactions over the local and the *single* remote server are not guaranteed to be executed atomically in the current form. It is possible that the remote transaction succeeds and the local one fails, for example, resulting in data inconsistency between the local and the remote. either we have to disable DMLs on an inheritance hierarchy which spans multiple servers. OR make sure that such transactions follow 2PC norms. -1 for disabling update queries on such an inheritance hierarchy because I think we should leave that to the user's judgment. And I think 2PC is definitely a separate patch. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On Sun, 2014-11-30 at 17:49 -0800, Peter Geoghegan wrote: On Mon, Nov 17, 2014 at 11:39 PM, Jeff Davis pg...@j-davis.com wrote: I can also just move isReset there, and keep mem_allocated as a uint64. That way, if I find later that I want to track the aggregated value for the child contexts as well, I can split it into two uint32s. I'll hold off any any such optimizations until I see some numbers from HashAgg though. I took a quick look at memory-accounting-v8.patch. Is there some reason why mem_allocated is a uint64? All other things being equal, I'd follow the example of tuplesort.c's MemoryContextAllocHuge() API, which (following bugfix commit 79e0f87a1) uses int64 variables to track available memory and so on. No reason. New version attached; that's the only change. Regards, Jeff Davis *** a/src/backend/utils/mmgr/aset.c --- b/src/backend/utils/mmgr/aset.c *** *** 438,451 AllocSetContextCreate(MemoryContext parent, Size initBlockSize, Size maxBlockSize) { ! AllocSet context; /* Do the type-independent part of context creation */ ! context = (AllocSet) MemoryContextCreate(T_AllocSetContext, ! sizeof(AllocSetContext), ! AllocSetMethods, ! parent, ! name); /* * Make sure alloc parameters are reasonable, and save them. --- 438,454 Size initBlockSize, Size maxBlockSize) { ! AllocSet set; ! MemoryContext context; /* Do the type-independent part of context creation */ ! context = MemoryContextCreate(T_AllocSetContext, ! sizeof(AllocSetContext), ! AllocSetMethods, ! parent, ! name); ! ! set = (AllocSet) context; /* * Make sure alloc parameters are reasonable, and save them. *** *** 459,467 AllocSetContextCreate(MemoryContext parent, if (maxBlockSize initBlockSize) maxBlockSize = initBlockSize; Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */ ! context-initBlockSize = initBlockSize; ! context-maxBlockSize = maxBlockSize; ! context-nextBlockSize = initBlockSize; /* * Compute the allocation chunk size limit for this context. It can't be --- 462,470 if (maxBlockSize initBlockSize) maxBlockSize = initBlockSize; Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */ ! set-initBlockSize = initBlockSize; ! set-maxBlockSize = maxBlockSize; ! set-nextBlockSize = initBlockSize; /* * Compute the allocation chunk size limit for this context. It can't be *** *** 477,486 AllocSetContextCreate(MemoryContext parent, * and actually-allocated sizes of any chunk must be on the same side of * the limit, else we get confused about whether the chunk is big. */ ! context-allocChunkLimit = ALLOC_CHUNK_LIMIT; ! while ((Size) (context-allocChunkLimit + ALLOC_CHUNKHDRSZ) (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION)) ! context-allocChunkLimit = 1; /* * Grab always-allocated space, if requested --- 480,489 * and actually-allocated sizes of any chunk must be on the same side of * the limit, else we get confused about whether the chunk is big. */ ! set-allocChunkLimit = ALLOC_CHUNK_LIMIT; ! while ((Size) (set-allocChunkLimit + ALLOC_CHUNKHDRSZ) (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION)) ! set-allocChunkLimit = 1; /* * Grab always-allocated space, if requested *** *** 500,519 AllocSetContextCreate(MemoryContext parent, errdetail(Failed while creating memory context \%s\., name))); } ! block-aset = context; block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ; block-endptr = ((char *) block) + blksize; ! block-next = context-blocks; ! context-blocks = block; /* Mark block as not to be released at reset time */ ! context-keeper = block; /* Mark unallocated space NOACCESS; leave the block header alone. */ VALGRIND_MAKE_MEM_NOACCESS(block-freeptr, blksize - ALLOC_BLOCKHDRSZ); } ! return (MemoryContext) context; } /* --- 503,525 errdetail(Failed while creating memory context \%s\., name))); } ! ! context-mem_allocated += blksize; ! ! block-aset = set; block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ; block-endptr = ((char *) block) + blksize; ! block-next = set-blocks; ! set-blocks = block; /* Mark block as not to be released at reset time */ ! set-keeper = block; /* Mark unallocated space NOACCESS; leave the block header alone. */ VALGRIND_MAKE_MEM_NOACCESS(block-freeptr, blksize - ALLOC_BLOCKHDRSZ); } ! return context; } /* *** *** 590,595 AllocSetReset(MemoryContext context) --- 596,603 else { /* Normal case, release the block */ + context-mem_allocated
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Mon, Dec 1, 2014 at 11:29 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Thu, Nov 27, 2014 at 11:43 PM, Michael Paquier michael.paqu...@gmail.com wrote: If you get that done, I'll have an extra look at it and then let's have a committer look at it. Attached patch is latest version. I added new enum values like REINDEX_OBJECT_XXX, and changed ReindexObject() function. Also ACL problem is fixed for this version. Thanks for the updated version. I have been looking at this patch more deeply, and I noticed a couple of things that were missing or mishandled: - The patch completely ignored that a catalog schema could be reindexed - When reindexing pg_catalog as a schema, it is critical to reindex pg_class first as reindex_relation updates pg_class. - gram.y generated some warnings because ReindexObjectType stuff was casted as ObjectType. - There was a memory leak, the scan keys palloc'd in ReindexObject were not pfree'd - Using do_user, do_system and now do_database was really confusing and reduced code lisibility. I reworked it using the object kind - The patch to support SCHEMA in reindexdb has been forgotten. Reattaching it here. Adding on top of that a couple of things cleaned up, like docs and typos, and I got the patch attached. Let's have a committer have a look a it now, I am marking that as Ready for Committer. Regards, -- Michael From 1e384f579ffa865c364616573bd8767a24306c44 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Tue, 2 Dec 2014 15:29:36 +0900 Subject: [PATCH 1/2] Support for REINDEX SCHEMA When pg_catalog is reindexed, note that pg_class is reindexed first for a behavior consistent with the cases of DATABASE and SYSTEM. --- doc/src/sgml/ref/reindex.sgml | 15 +++- src/backend/commands/indexcmds.c | 110 + src/backend/parser/gram.y | 35 + src/backend/tcop/utility.c | 15 ++-- src/bin/psql/tab-complete.c| 4 +- src/include/commands/defrem.h | 3 +- src/include/nodes/parsenodes.h | 11 ++- src/test/regress/expected/create_index.out | 31 src/test/regress/sql/create_index.sql | 23 ++ 9 files changed, 191 insertions(+), 56 deletions(-) diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index cabae19..0a4c7d4 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ] +REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ] /synopsis /refsynopsisdiv @@ -101,6 +101,19 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERnam /varlistentry varlistentry +termliteralSCHEMA/literal/term +listitem + para + Recreate all indexes of the specified schema. If a table of this + schema has a secondary quoteTOAST/ table, that is reindexed as + well. Indexes on shared system catalogs are also processed. + This form of commandREINDEX/command cannot be executed inside a + transaction block. + /para +/listitem + /varlistentry + + varlistentry termliteralDATABASE/literal/term listitem para diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 12b4ac7..a3e8a15 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1777,34 +1777,58 @@ ReindexTable(RangeVar *relation) } /* - * ReindexDatabase - * Recreate indexes of a database. + * ReindexObject + * Recreate indexes of object whose type is defined by objectKind. * * To reduce the probability of deadlocks, each table is reindexed in a * separate transaction, so we can release the lock on it right away. * That means this must not be called within a user transaction block! */ Oid -ReindexDatabase(const char *databaseName, bool do_system, bool do_user) +ReindexObject(const char *objectName, ReindexObjectType objectKind) { + Oid objectOid; Relation relationRelation; HeapScanDesc scan; + ScanKeyData *scan_keys = NULL; HeapTuple tuple; MemoryContext private_context; MemoryContext old; List *relids = NIL; ListCell *l; + int num_keys; - AssertArg(databaseName); + AssertArg(objectName); + Assert(objectKind == REINDEX_OBJECT_SCHEMA || + objectKind == REINDEX_OBJECT_SYSTEM || + objectKind == REINDEX_OBJECT_DATABASE); - if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0) - ereport(ERROR, -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(can only reindex the currently open database))); + /* + * Get OID of object to reindex, being the database currently being + * used by session for a database or for system catalogs, or the schema + * defined by caller. At the same time do
Re: [HACKERS] New Event Trigger: table_rewrite
On Thu, Nov 20, 2014 at 10:37 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: CLUSTER and VACUUM are not part of the supported commands anymore, so I think that we could replace that by the addition of a reference number in the cell of ALTER TABLE for the event table_rewrite and write at the bottom of the table a description of how this event behaves with ALTER TABLE. Note as well that might or might not is not really helpful for the user. That's precisely why we have an event trigger here, I think --- for some subcommands, it's not easy to determine whether a rewrite happens or not. (I think SET TYPE is the one). I don't think we want to document precisely under what condition a rewrite takes place. Yeah, the current documentation expands to the following sentence, as browsed in http://www.postgresql.org/docs/9.3/interactive/sql-altertable.html As an exception, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed, but any indexes on the affected columns must still be rebuilt. I don't think that might or might not is less helpful in the context of the Event Trigger, because the whole point is that the event is only triggered in case of a rewrite. Of course we could cross link the two paragraphs or something. 2) The examples of SQL queries provided are still in lower case in the docs, that's contrary to the rest of the docs where upper case is used for reserved keywords. Right, being consistent trumps personal preferences, changed in the attached. Yes please. nitpick Another thing in that sample code is not current_hour between 1 and 6. That reads strange to me. It should be equally correct to spell it as current_hour not between 1 and 6, which seems more natural. / True, fixed in the attached. The status of this patch was not updated on the commit fest app, so I lost track of it. Sorry for not answering earlier btw. The following things to note about v5: 1) There are still mentions of VACUUM, ANALYZE and CLUSTER: @@ -264,6 +275,10 @@ check_ddl_tag(const char *tag) obtypename = tag + 6; else if (pg_strncasecmp(tag, DROP , 5) == 0) obtypename = tag + 5; + else if (pg_strncasecmp(tag, ANALYZE, 7) == 0 || +pg_strncasecmp(tag, CLUSTER, 7) == 0 || +pg_strncasecmp(tag, VACUUM, 6) == 0) + return EVENT_TRIGGER_COMMAND_TAG_OK; 2) There are a couple of typos and incorrect styling, like if(. Nothing huge.. Cleanup is done in the attached. In any case, all the issues mentioned seem to have been addressed, so switching this patch to ready for committer. Regards, -- Michael diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index 6a3002f..17b0818 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -42,11 +42,14 @@ #include utils/syscache.h #include tcop/utility.h - +/* + * Data Structure for sql_drop and table_rewrite Event Trigger support. + */ typedef struct EventTriggerQueryState { slist_head SQLDropList; bool in_sql_drop; + Oid table_rewrite_oid; MemoryContext cxt; struct EventTriggerQueryState *previous; } EventTriggerQueryState; @@ -119,11 +122,14 @@ static void AlterEventTriggerOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId); static event_trigger_command_tag_check_result check_ddl_tag(const char *tag); +static event_trigger_command_tag_check_result check_table_rewrite_ddl_tag( + const char *tag); static void error_duplicate_filter_variable(const char *defname); static Datum filter_list_to_array(List *filterlist); static Oid insert_event_trigger_tuple(char *trigname, char *eventname, Oid evtOwner, Oid funcoid, List *tags); static void validate_ddl_tags(const char *filtervar, List *taglist); +static void validate_table_rewrite_tags(const char *filtervar, List *taglist); static void EventTriggerInvoke(List *fn_oid_list, EventTriggerData *trigdata); /* @@ -154,7 +160,8 @@ CreateEventTrigger(CreateEventTrigStmt *stmt) /* Validate event name. */ if (strcmp(stmt-eventname, ddl_command_start) != 0 strcmp(stmt-eventname, ddl_command_end) != 0 - strcmp(stmt-eventname, sql_drop) != 0) + strcmp(stmt-eventname, sql_drop) != 0 + strcmp(stmt-eventname, table_rewrite) != 0) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg(unrecognized event name \%s\, @@ -183,6 +190,9 @@ CreateEventTrigger(CreateEventTrigStmt *stmt) strcmp(stmt-eventname, sql_drop) == 0) tags != NULL) validate_ddl_tags(tag, tags); + else if (strcmp(stmt-eventname, table_rewrite) == 0 + tags != NULL) + validate_table_rewrite_tags(tag, tags); /* * Give user a nice error message if an event trigger of the same name