[HACKERS] Concurrently option for reindexdb
Hi all, Attached WIP patch adds -C (--concurrently) option for reindexdb command for concurrently reindexing. If we specify -C option with any table then reindexdb do reindexing concurrently with minimum lock necessary. Note that we cannot use '-s' option (for system catalog) and '-C' option at the same time. This patch use simple method as follows. 1. Do CREATE INDEX CONCURRENTLY new index which has same definition as target index 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts) 3. Swap old and new index 4. Drop old index 5. COMMIT These process are based on pg_repack(or pg_reorg) does, done via SQL. ToDo - Multi language support for log message. Regards, --- Sawada Masahiko reindexdb-concurrently_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Attached WIP patch adds -C (--concurrently) option for reindexdb command for concurrently reindexing. If we specify -C option with any table then reindexdb do reindexing concurrently with minimum lock necessary. Note that we cannot use '-s' option (for system catalog) and '-C' option at the same time. This patch use simple method as follows. 1. Do CREATE INDEX CONCURRENTLY new index which has same definition as target index 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts) 3. Swap old and new index 4. Drop old index 5. COMMIT These process are based on pg_repack(or pg_reorg) does, done via SQL. This would be a useful for users, but I am not sure that you can call that --concurrently as the rename/swap phase requires an exclusive lock, and you would actually block a real implementation of REINDEX CONCURRENTLY (hm...). ToDo - Multi language support for log message. Why? I am not sure that's something you should deal with. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
Hi Rukh, (2014/08/15 6:18), Rukh Meski wrote: Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. Before looking into the patch, I'd like to know the use cases in more details. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] minor config doc update
Find a small documentation patch attached: - show the valid range for segment_timeout - remove one spurious empty line (compared to other descriptions) -- Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f23e5dc..49547ee 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2274,8 +2274,9 @@ include_dir 'conf.d' /term listitem para -Maximum time between automatic WAL checkpoints, in -seconds. The default is five minutes (literal5min/). +Maximum time between automatic WAL checkpoints, in seconds. +The valid range is between 30 seconds and one hour. +The default is five minutes (literal5min/). Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in the filenamepostgresql.conf/ @@ -2294,7 +2295,6 @@ include_dir 'conf.d' para Specifies the target of checkpoint completion, as a fraction of total time between checkpoints. The default is 0.5. - This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Mon, Aug 25, 2014 at 3:48 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Attached WIP patch adds -C (--concurrently) option for reindexdb command for concurrently reindexing. If we specify -C option with any table then reindexdb do reindexing concurrently with minimum lock necessary. Note that we cannot use '-s' option (for system catalog) and '-C' option at the same time. This patch use simple method as follows. 1. Do CREATE INDEX CONCURRENTLY new index which has same definition as target index 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts) 3. Swap old and new index 4. Drop old index 5. COMMIT These process are based on pg_repack(or pg_reorg) does, done via SQL. This would be a useful for users, but I am not sure that you can call that --concurrently as the rename/swap phase requires an exclusive lock, and you would actually block a real implementation of REINDEX CONCURRENTLY (hm...). this might be difficult to call this as --concurrently. It might need to be change the name. ToDo - Multi language support for log message. Why? I am not sure that's something you should deal with. The log message which has been existed already are supported multi language support using by .po file, But newly added message has not corresponded message in .po file, I thought. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
--On 21. August 2014 22:08:58 +0200 Magnus Hagander mag...@hagander.net wrote: I vote for discarding 8.3 support in pg_upgrade. There are already enough limitations on pg_upgrade from pre-8.4 to make it of questionable value; if it's going to create problems like this, it's time to cut the rope. +1. 8.3 has been unsupported for a fairly long time now, and you can still do a two-step upgrade if you're on that old a version. Also +1 from my side. I've seen some old 8.3 installations at customers, still, but they aren't large and can easily be upgraded with a two step upgrade. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
On 08/25/2014 12:49 AM, johnlumby wrote: On 08/19/14 18:27, Heikki Linnakangas wrote: Please write the patch without atomic CAS operation. Just use a spinlock. Umm, this is a new criticism I think. Yeah. Be prepared that new issues will crop up as the patch gets slimmer and easier to review :-). Right now there's still so much chaff that it's difficult to see the wheat. I use CAS for things other than locking, such as add/remove from shared queue. I suppose maybe a spinlock on the entire queue can be used equivalently, but with more code (extra confusion) and worse performance (coarser serialization). What is your objection to using gcc's atomic ops? Portability? Yeah, portability. Atomic ops might make sense, but at this stage it's important to slim down the patch to the bare minimum, so that it's easier to review. Once the initial patch is in, you can improve it with additional patches. There's a patch in the commitfest to add support for that, sorry, support for what? There are already spinlocks in postgresql, you mean some new kind?please point me at it with hacker msgid or something. Atomic ops: https://commitfest.postgresql.org/action/patch_view?id=1314 Once that's committed, you can use the new atomic ops in your patch. But until then, stick to spinlocks. On 08/19/14 19:10, Claudio Freire wrote: On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Also, please split prefetching of regular index scans into a separate patch. ... That patch already happened on the list, and it wasn't a win in many cases. I'm not sure it should be proposed independently of this one. Maybe a separate patch, but it should be considered dependent on this. I don't have an archive link at hand atm, but I could produce one later. Several people have asked to split this patch into several smaller ones and I have thought about it. It would introduce some awkward dependencies. E.g. splitting the scanner code (index, relation-heap) into separate patch from aio code would mean that the scanner patch becomes dependent on the aio patch. They are not quite orthogonal. Right now, please focus on the main AIO patch. That should show a benefit for bitmap heap scans too, so to demonstrate the benefits of AIO, you don't need to prefetch regular index scans. The main AIO patch can be written, performance tested, and reviewed without caring about regular index scans at all. The reason is that the scanners call a new function, DiscardBuffer(blockid) when aio is in use. We can get around it by providing a stub for that function in the scanner patch, but then there is some risk of someone getting the wrong version of that function in their build. It just adds yet more complexity and breakage opportunities. Regardless of the regular index scans, we'll need to discuss the new API of PrefetchBuffer and DiscardBuffer. It would be simpler for the callers if you didn't require the DiscardBuffer calls. I think it would be totally feasible to write the patch that way. Just drop the buffer pin after the asynchronous read finishes. When the caller actually needs the page, it will call ReadBuffer which will pin it again. You'll get a little bit more bufmgr traffic that way, but I think it'll be fine in practice. One further comment concerning these BufferAiocb and aiocb control blocks being in shared memory : I explained above that the BufferAiocb must be in shared memory for wait/post. The aiocb does not necessarily have to be in shared memory, but since there is a one-to-one relationship between BufferAiocb and aiocb, it makes the code much simpler , and the operation much more efficient, if the aiocb is embedded in the BufferAiocb as I have it now. E.g. if the aiocb is in private-process memory, then an additional allocation scheme is needed (fixed number? palloc()in'g extra ones as needed? ...) which adds complexity, Yep, use palloc or a fixed pool. There's nothing wrong with that. and probably some inefficiency since a shared pool is usually more efficient (allows higher maximum per process etc), and there would have to be some pointer de-referencing from BufferAiocb to aiocb adding some (small) overhead. I think you're falling into the domain of premature optimization. A few pointer dereferences are totally insignificant, and the amount of memory you're saving pales in comparison to other similar non-shared pools and caches we have (catalog caches, for example). And on the other side of the coin, with a shared pool you'll waste memory when async I/O is not used (e.g because everything fits in RAM), and when it is used, you'll have more contention on locks and cache lines when multiple processes use the same objects. The general guideline in PostgreSQL is that everything is backend-private, except structures used to communicate between backends. - Heikki -- Sent via pgsql-hackers mailing list
Re: [HACKERS] implement subject alternative names support for SSL connections
On 08/24/2014 03:11 PM, Alexey Klyukin wrote: On Wed, Aug 20, 2014 at 11:53 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 07/25/2014 07:10 PM, Alexey Klyukin wrote: Greetings, I'd like to propose a patch for checking subject alternative names entry in the SSL certificate for DNS names during SSL authentication. Thanks! I just ran into this missing feature last week, while working on my SSL test suite. So +1 for having the feature. This patch needs to be rebased over current master branch, thanks to my refactoring that moved all OpenSSL-specific stuff to be-secure-openssl.c. The patch is rebased against fe-secure-openssl.c (that's where verify_peer_name_matches_certificate appeared in the master branch), I've changed the condition in the for loop to be less confusing (thanks to comments from Magnus and Tom), making an explicit break once a match is detected. The patch doesn't seem to support wildcards in alternative names. Is that on purpose? It would be good to add a little helper function that does the NULL-check, straight comparison, and wildcard check, for a single name. And then use that for the Common Name and all the Alternatives. That'll ensure that all the same rules apply whether the name is the Common Name or an Alternative (assuming that the rules are supposed to be the same; I don't know if that's true). But actually, I wonder if we should delegate the whole hostname matching to OpenSSL? There's a function called X509_check_host for that, although it's new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep the current code to handle older versions. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] implement subject alternative names support for SSL connections
On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote: But actually, I wonder if we should delegate the whole hostname matching to OpenSSL? There's a function called X509_check_host for that, although it's new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep the current code to handle older versions. Given that we're about to add support for other SSL implementations I'm not sure that that's a good idea. IIRC there exist quite a bit of different interpretations about what denotes a valid cert between the libraries. Doesn't sound fun to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] implement subject alternative names support for SSL connections
On 08/25/2014 01:07 PM, Andres Freund wrote: On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote: But actually, I wonder if we should delegate the whole hostname matching to OpenSSL? There's a function called X509_check_host for that, although it's new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep the current code to handle older versions. Given that we're about to add support for other SSL implementations I'm not sure that that's a good idea. IIRC there exist quite a bit of different interpretations about what denotes a valid cert between the libraries. Really? That sounds scary. I can imagine that some libraries support more complicated stuff like Internationalized Domain Names, while others don't, but as long as they all behave the same with the basic stuff, I think that's acceptable. Doesn't sound fun to me. As long as just this patch is concerned, I agree it's easier to just implement it ourselves, but if we want to start implementing more complicated rules, then I'd rather not get into that business at all, and let the SSL library vendor deal with the bugs and CVEs. I guess we'll go ahead with this patch for now, but keep this in mind if someone wants to complicate the rules further in the future. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Switch pg_basebackup to use -X stream instead of -X fetch by default?
Hi, currently pg_basebackup uses fetch mode when only -x is specified - which imo isn't a very good thing to use due to the increased risk of not fetching everything. How about switching to stream mode for 9.5+? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote: At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas robertmh...@gmail.com wrote in CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=wc8...@mail.gmail.com 1. I think it's the case that there are platforms around where a signal won't cause send() to return EINTR and I'd be entirely unsurprised if SSL_write() doesn't necessarily return EINTR in that case. I'm not sure what, if anything, we can do about that. We use a custom write routine with SSL_write, where we call send() ourselves, so that's not a problem as long as we put the check in the right place (in secure_raw_write(), after my recent SSL refactoring - the patch needs to be rebased). man 2 send on FreeBSD has not description about EINTR.. And even on linux, send won't return EINTR for most cases, at least I haven't seen that. So send()=-1,EINTR seems to me as only an equivalent of send() = 0. I have no idea about what the implementer thought the difference is. As the patch stands, there's a race condition: if the SIGTERM arrives *before* the send() call, the send() won't return EINTR anyway. So there's a chance that you still block. Calling pq_terminate_backend() again will dislodge it (assuming send() returns with EINTR on signal), but I don't think we want to define the behavior as usually, pq_terminate_backend() will kill a backend that's blocked on sending to the client, but sometimes you have to call it twice (or more!) to really kill it. A more robust way is to set ImmediateInterruptOK before calling send(). That wouldn't let you send data that can be sent without blocking though. For that, you could put the socket to non-blocking mode, and sleep with select(), also waiting for the process' latch at the same time (die() sets the latch, so that will wake up the select() if a termination request arrives). Is it actually safe to process the die-interrupt where send() is called? ProcessInterrupts() does ereport(FATAL, ...), which will attempt to send a message to the client. If that happens in the middle of constructing some other message, that will violate the protocol. 2. I think it would be reasonable to try to kill off the connection without notifying the client if we're unable to send the data to the client in a reasonable period of time. But I'm unsure what a reasonable period of time means. This patch would basically do it after no delay at all, which seems like it might be too aggressive. However, I'm not sure. I think there's no such a reasonable time. I agree it's pretty hard to define any reasonable timeout here. I think it would be fine to just cut the connection; even if you don't block while sending, you'll probably reach a CHECK_FOR_INTERRUPT() somewhere higher in the stack and kill the connection almost as abruptly anyway. (you can't violate the protocol, however) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On Mon, Aug 25, 2014 at 12:18 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/08/15 6:18), Rukh Meski wrote: Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. Before looking into the patch, I'd like to know the use cases in more details. You can once check the previous commit fest thread [1] for this feature, in that you can find some use cases and what are the difficulties to implement this feature, it might aid you in review of this feature. [1] https://commitfest.postgresql.org/action/patch_view?id=1412 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
Etsuro Fujita wrote: Done. (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is, though.) Other changes: * Address the comments from Eitoku-san. * Add regression tests. * Fix a bug, which fails to show the actual row counts in EXPLAIN ANALYZE for UPDATE/DELETE without a RETURNING clause. * Rebase to HEAD. Please find attached an updated version of the patch. Here is my review: The patch Applies fine, Builds without warning and passes make Check, so the ABC of patch reviewing is fine. I played with it, and apart from Hanada's comments I have found the following: test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id 3; QUERY PLAN -- Update on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1) - Foreign Scan on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=27 loops=1) Output: id, val, ctid Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id 3)) Planning time: 0.179 ms Execution time: 3706.919 ms (6 rows) Time: 3708.272 ms The actual time readings are surprising. Shouldn't these similar to the actual execution time, since most of the time is spent in the foreign scan node? Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed during postgresBeginForeignScan rather than during postgresIterateForeignScan. It probably does not matter, but is there a reason to do it different from the normal scan? It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE is pushed down, right? Maybe it would make sense to add an assertion for that. I ran a simple performance test and found that performance is improved as expected; updating 10 rows took 1000 rather than 8000 ms, and DELETING the same amount took 200 instead of 6500 ms. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On 07/12/2014 05:16 AM, Jeff Davis wrote: I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('','#') from t; which was the worst reasonable case I could think of. (It appears that selecting from a table is faster than from generate_series. I'm curious what you use when testing the performance of an individual function at the SQL level.) A large table like that is what I usually do. A large generate_series() spends a lot of time building the tuplestore, especially if it doesn't fit in work_mem and spills to disk. Sometimes I use this to avoid it: explain analyze select similar_escape('','#') from generate_series(1, 1) a, generate_series(1,1000); although in my experience it still has somewhat more overhead than a straight seqscan because. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/08/15 6:18), Rukh Meski wrote: Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. Before looking into the patch, I'd like to know the use cases in more details. There have been a few times I wanted something like this, so that it was easier to do an update that affects a very high percentage of rows in a table, while making the old version of the row no longer match the selection criteria for the UPDATE. There are workarounds using cursors or subselects returning ctid, but they are kludgy and error prone. Basically I wanted to alternate UPDATE of a subset of the rows with table VACUUM so that subsequent iterations can re-use space and avoid bloating the table. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
Bernd Helmle maili...@oopsware.de wrote: Magnus Hagander mag...@hagander.net wrote: I vote for discarding 8.3 support in pg_upgrade. There are already enough limitations on pg_upgrade from pre-8.4 to make it of questionable value; if it's going to create problems like this, it's time to cut the rope. +1. 8.3 has been unsupported for a fairly long time now, and you can still do a two-step upgrade if you're on that old a version. Also +1 from my side. I've seen some old 8.3 installations at customers, still, but they aren't large and can easily be upgraded with a two step upgrade. +1 If we could leave it without it being any extra work, fine; but once a release is over a year out of support, if it's a matter of putting extra work on the pg hackers or on the users who have chosen to wait more than a year after support ends to do the upgrade, I'm OK with asking those users to do a two-phase upgrade or fall back to pg_dump. It's not like we're leaving them without any options. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
Heikki Linnakangas hlinnakan...@vmware.com writes: On 07/12/2014 05:16 AM, Jeff Davis wrote: I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from t; which was the worst reasonable case I could think of. (It appears that selecting from a table is faster than from generate_series. I'm curious what you use when testing the performance of an individual function at the SQL level.) A large table like that is what I usually do. A large generate_series() spends a lot of time building the tuplestore, especially if it doesn't fit in work_mem and spills to disk. Sometimes I use this to avoid it: explain analyze select similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from generate_series(1, 1) a, generate_series(1,1000); although in my experience it still has somewhat more overhead than a straight seqscan because. [ scratches head... ] Surely similar_escape is marked immutable, and will therefore be executed exactly once in either of these formulations, because the planner will fold the expression to a constant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
Hi, I finally had some time to get back to this. I attached version3 of the patch which fixes Tom's complaint about int8 version by removing the int8 version as it does not seem necessary (the float8 can handle integers just fine). This patch now basically has just one optimized function for float8 and one for numeric datatypes, just like width_bucket. On 08/07/14 02:14, Tom Lane wrote: Also, I'm not convinced by this business of throwing an error for a NULL array element. Per spec, null arguments to width_bucket() produce a null result, not an error --- shouldn't this flavor act similarly? In any case I think the test needs to use array_contains_nulls() not just ARR_HASNULL. I really think there should be difference between NULL array and NULL inside array and that NULL inside the array is wrong input. I changed the check to array_contains_nulls() though. On 08/07/14 02:14, Tom Lane wrote: Lastly, the spec defines behaviors for width_bucket that allow either ascending or descending buckets. We could possibly do something similar I am not sure it's worth it here as we require input to be sorted anyway. It might be worthwhile if we decided to do this as an aggregate (since there input would not be required to be presorted) instead of function but I am not sure if that's desirable either as that would limit usability to only the single main use-case (group by and count()). On 20/07/14 11:01, Simon Riggs wrote: On 16 July 2014 20:35, Pavel Stehule pavel.steh...@gmail.com wrote: On 08/07/14 02:14, Tom Lane wrote: I didn't see any discussion of the naming question in this thread. I'd like to propose that it should be just width_bucket(); we can easily determine which function is meant, considering that the SQL-spec variants don't take arrays and don't even have the same number of actual arguments. I did mention in submission that the names are up for discussion, I am all for naming it just width_bucket. I had this idea too - but I am not sure if it is good idea. A distance between ANSI SQL with_bucket and our enhancing is larger than in our implementation of median for example. I can live with both names, but current name I prefer. So I suggest that we use the more generic function name bin(), with a second choice of discretize() (though that seems annoyingly easy to spell incorrectly) I really don't think bin() is good choice here, bucket has same meaning in this context and bin is often used as shorthand for binary which would be confusing here. Anyway I currently left the name as it is, I would not be against using width_bucket() or even just bucket(), not sure about the discretize() option. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c715ca2..e28ff68 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -920,6 +920,31 @@ entryliteralwidth_bucket(5.35, 0.024, 10.06, 5)/literal/entry entryliteral3/literal/entry /row + + row + entry +indexterm + primaryvarwidth_bucket/primary +/indexterm +literalfunctionvarwidth_bucket(parameterop/parameter typenumeric/type, parameterthresholds/parameter typenumerc[]/type)/function/literal/entry + entrytypeint/type/entry + entryreturn the bucket to which parameteroperand/ would + be assigned based on right-bound bucket parameterthresholds/, + the parameterthresholds/ must be ordered (smallest first)/entry + entryliteralvarwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric[])/literal/entry + entryliteral3/literal/entry + /row + + row + entryliteralfunctionvarwidth_bucket(parameterop/parameter typedp/type, parameterthresholds/parameter typedp[]/type)/function/literal/entry + entrytypeint/type/entry + entryreturn the bucket to which parameteroperand/ would + be assigned based on right-bound bucket parameterthresholds/, + the parameterthresholds/ must be ordered (smallest first)/entry + entryliteralvarwidth_bucket(5.35::float8, ARRAY[1, 3, 4, 6]::float8[])/literal/entry + entryliteral3/literal/entry + /row + /tbody /tgroup /table diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 41b3eaa..fcbb623 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -2800,6 +2800,61 @@ width_bucket_float8(PG_FUNCTION_ARGS) PG_RETURN_INT32(result); } +/* + * Implements the float8 version of the varwidth_bucket() function. + * See also varwidth_bucket_general() and varwidth_bucket_int8(). + * + * 'thresholds' is an array (must be sorted from smallest to biggest value) + * containing right-bounds for each bucket, varwidth_bucket() returns + * integer indicating the bucket number that 'operand' belongs to. An operand + * greater than the upper bound is
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On 07/12/2014 05:16 AM, Jeff Davis wrote: On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Attached is a small patch to $SUBJECT. In master, only single-byte characters are allowed as an escape. Of course, with the patch it must still be a single character, but it may be multi-byte. I'm concerned about the performance cost of this patch. Have you done any measurements about what kind of overhead you are putting on the inner loop of similar_escape? I didn't consider this very performance critical, because this is looping through the pattern, which I wouldn't expect to be a long string. On my machine using en_US.UTF-8, the difference is imperceptible for a SIMILAR TO ... ESCAPE query. I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('','#') from t; which was the worst reasonable case I could think of. Actually, that gets optimized to a constant in the planner: postgres=# explain verbose select similar_escape('','#') from t; QUERY PLAN -- Seq Scan on public.t (cost=0.00..144247.85 rows=985 width=0) Output: '^(?: )$'::text Planning time: 0.033 ms (3 rows) With a working test case: create table t (pattern text); insert into t select '' from generate_series(1, 100); vacuum t; explain (analyze) select similar_escape(pattern,'#') from t; your patch seems to be about 2x-3x as slow as unpatched master. So this needs some optimization. A couple of ideas: 1. If the escape string is in fact a single-byte character, you can proceed with the loop just as it is today, without the pg_mblen calls. 2. Since pg_mblen() will always return an integer between 1-6, it would probably be faster to replace the memcpy() and memcmp() calls with simple for-loops iterating byte-by-byte. In very brief testing, with the 1. change above, the performance with this patch is back to what it's without the patch. See attached. - Heikki diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index caf45ef..dd46325 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -688,11 +688,16 @@ similar_escape(PG_FUNCTION_ARGS) elen = VARSIZE_ANY_EXHDR(esc_text); if (elen == 0) e = NULL; /* no escape character */ - else if (elen != 1) - ereport(ERROR, - (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE), - errmsg(invalid escape string), - errhint(Escape string must be empty or one character.))); + else + { + int escape_mblen = pg_mbstrlen_with_len(e, elen); + + if (escape_mblen 1) +ereport(ERROR, + (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE), + errmsg(invalid escape string), + errhint(Escape string must be empty or one character.))); + } } /*-- @@ -723,59 +728,94 @@ similar_escape(PG_FUNCTION_ARGS) while (plen 0) { char pchar = *p; + int mblen; - if (afterescape) + /* + * If the escape string is single-byte character, we can process the + * the pattern one byte at a time, ignoring multi-byte characters. + * (This works because all server-encodings have the property that the + * a non-first byte of a multi-byte characters always has the high-bit + * set, and hence we cannot be fooled by a byte in the middle of a + * multi-byte character.) + */ + if (elen == 1 || (mblen = pg_mblen(p))) { - if (pchar == '' !incharclass) /* for SUBSTRING patterns */ -*r++ = ((nquotes++ % 2) == 0) ? '(' : ')'; - else + if (afterescape) + { +if (pchar == '' !incharclass) /* for SUBSTRING patterns */ + *r++ = ((nquotes++ % 2) == 0) ? '(' : ')'; +else +{ + *r++ = '\\'; + *r++ = pchar; +} +afterescape = false; + } + else if (e pchar == *e) + { +/* SQL99 escape character; do not send to output */ +afterescape = true; + } + else if (incharclass) + { +if (pchar == '\\') + *r++ = '\\'; +*r++ = pchar; +if (pchar == ']') + incharclass = false; + } + else if (pchar == '[') + { +*r++ = pchar; +incharclass = true; + } + else if (pchar == '%') + { +*r++ = '.'; +*r++ = '*'; + } + else if (pchar == '_') +*r++ = '.'; + else if (pchar == '(') + { +/* convert to non-capturing parenthesis */ +*r++ = '('; +*r++ = '?'; +*r++ = ':'; + } + else if (pchar == '\\' || pchar == '.' || + pchar == '^' || pchar == '$') { *r++ = '\\'; *r++ = pchar; } - afterescape = false; - } - else if
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On 08/25/2014 04:48 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 07/12/2014 05:16 AM, Jeff Davis wrote: I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from t; which was the worst reasonable case I could think of. (It appears that selecting from a table is faster than from generate_series. I'm curious what you use when testing the performance of an individual function at the SQL level.) A large table like that is what I usually do. A large generate_series() spends a lot of time building the tuplestore, especially if it doesn't fit in work_mem and spills to disk. Sometimes I use this to avoid it: explain analyze select similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from generate_series(1, 1) a, generate_series(1,1000); although in my experience it still has somewhat more overhead than a straight seqscan because. [ scratches head... ] Surely similar_escape is marked immutable, and will therefore be executed exactly once in either of these formulations, because the planner will fold the expression to a constant. Yeah, just noticed that myself.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unwanted LOG during recovery of DROP TABLESPACE REDO
On 08/15/2014 12:31 PM, Marko Tiikkaja wrote: On 7/16/14 4:33 PM, Tom Lane wrote: Rajeev rastogi rajeev.rast...@huawei.com writes: I found and fixed a bug that causes recovery (crash recovery , PITR) to throw unwanted LOG message if the tablespace symlink is not found during the processing of DROP TABLESPACE redo. LOG: could not remove symbolic link pg_tblspc/16384: No such file or directory I don't think that's a bug: it's the designed behavior. Why should we complicate the code to not print a log message in a situation where it's unclear if the case is expected or not? I agree with Tom here; this doesn't seem like an improvement. Well, for comparison, we also silently ignore non-existent files when replaying a DROP TABLE. I could go either way myself, but this is clearly a very minor thing, and we have two -1's, so I'm marking this as Rejected in the commitfest. Thanks anyway! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
Heikki Linnakangas hlinnakan...@vmware.com writes: On 08/25/2014 04:48 PM, Tom Lane wrote: [ scratches head... ] Surely similar_escape is marked immutable, and will therefore be executed exactly once in either of these formulations, because the planner will fold the expression to a constant. Yeah, just noticed that myself.. ... although, given that, it is also fair to wonder how much the speed of similar_escape really matters. Surely in most use-cases the pattern and escape char will be constants. And, when they are not, wouldn't the subsequent parsing work for the regexes dominate the cost of similar_escape anyway? IOW, I'm not sure we should be advising Jeff to duplicate code in order to have a fast path. Keeping it short might be the best goal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
Thomas Munro wrote: On 22 August 2014 23:02, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Forgive me if I have misunderstood but it looks like your incremental patch included a couple of unrelated changes, namely s/0/InvalidCommandId/ and a reversion of ConditionalMultiXactIdWait. Yeah, sorry about those, will push separately. Undoing those gave me skip-locked-v12-b.patch, attached for reference, and I've included those changes in a new full patch skip-locked-v13.patch (also rebased). +1 for the change from if-then-else to switch statements. I was less sure about the 'goto failed' change, but I couldn't measure any change in concurrent throughput in my simple benchmark, so I guess that extra buffer lock/unlock doesn't matter and I can see why you prefer that control flow. I was also thinking in reducing the lock level acquired to shared rather than exclusive in all the paths that goto failed. Since the lock is only used to read a couple of fields from the tuple, shared is enough and should give slightly better concurrency. Per buffer access rules in src/backend/storage/buffer/README: : 1. To scan a page for tuples, one must hold a pin and either shared or : exclusive content lock. To examine the commit status (XIDs and status bits) : of a tuple in a shared buffer, one must likewise hold a pin and either shared : or exclusive lock. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On 08/25/2014 06:14 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 08/25/2014 04:48 PM, Tom Lane wrote: [ scratches head... ] Surely similar_escape is marked immutable, and will therefore be executed exactly once in either of these formulations, because the planner will fold the expression to a constant. Yeah, just noticed that myself.. ... although, given that, it is also fair to wonder how much the speed of similar_escape really matters. Surely in most use-cases the pattern and escape char will be constants. And, when they are not, wouldn't the subsequent parsing work for the regexes dominate the cost of similar_escape anyway? IOW, I'm not sure we should be advising Jeff to duplicate code in order to have a fast path. Keeping it short might be the best goal. It's certainly not worth bending over backwards for a small performance gain here, but I think special-casing a single-byte escape sequence is still quite reasonable. It doesn't make the code any longer. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements cluttered with DEALLOCATE dbdpg_p*
On 07/20/2014 11:51 PM, Peter Geoghegan wrote: On Sun, Jul 20, 2014 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, this is certainly a behavioral change. Perhaps squeeze it into 9.4, but not the back braches? +1 Ok, done. (We're a month closer to releasing 9.4 than we were when this consensus was reached, but I think it's still OK...) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch versus \timing
On 08/18/2014 10:51 AM, Michael Paquier wrote: On Mon, Aug 18, 2014 at 4:12 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 18, 2014 at 3:19 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Aug 14, 2014 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: Attached patch changes \watch so that it displays how long the query takes if \timing is enabled. I didn't refactor PSQLexec and SendQuery into one routine because the contents of those functions are not so same. I'm not sure how much it's worth doing that refactoring. Anyway this feature is quite useful even without that refactoring, I think. The patch applies correctly and it does correctly what it is made for: =# \timing Timing is on. =# select 1; ?column? -- 1 (1 row) Time: 0.407 ms =# \watch 1 Watch every 1sMon Aug 18 15:17:41 2014 ?column? -- 1 (1 row) Time: 0.397 ms Watch every 1sMon Aug 18 15:17:42 2014 ?column? -- 1 (1 row) Time: 0.615 ms Refactoring it would be worth it thinking long-term... And printing the timing in PSQLexec code path is already done in SendQuery, so that's doing two times the same thing IMHO. Now, looking at the patch, introducing the new function PSQLexecInternal with an additional parameter to control the timing is correct choosing the non-refactoring way of doing. But I don't think that printing the time outside PSQLexecInternal is consistent with SendQuery. Why not simply control the timing with a boolean flag and print the timing directly in PSQLexecInternal? Because the timing needs to be printed after the query result. Thanks for pointing that. Yes this makes the refactoring a bit more difficult. Michael reviewed this, so I'm marking this as Ready for Committer. Since you're a committer yourself, I expect you'll take it over from here. I agree that refactoring this would be nice in the long-term, and I also agree that it's probably OK as it is in the short-term. I don't like the name PSQLexecInternal, though. PSQLexec is used for internal commands anyway. In fact it's backwards, because PSQLexecInternal is used for non-internal queries, given by \watch, while PSQLexec is used for internal commands. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver
Le 21/08/2014 10:17, Julien Rouhaud a écrit : Hello, Attached patch implements the following TODO item : Track number of WAL files ready to be archived in pg_stat_archiver However, it will track the total number of any file ready to be archived, not only WAL files. Please let me know what you think about it. Regards. Hi, Maybe looking at archive ready count will be more efficient if it is done in the view definition through a function. This will avoid any issue with incrementing/decrement of archiverStats.ready_count and the patch will be more simple. Also I don't think we need an other memory allocation for that, the counter information is always in the number of .ready files in the archive_status directory and the call to pg_stat_archiver doesn't need high speed performances. For example having a new function called pg_stat_get_archive_ready_count() that does the same at what you add into pgstat_read_statsfiles() and the pg_stat_archiver defined as follow: CREATE VIEW pg_stat_archiver AS s.failed_count, s.last_failed_wal, s.last_failed_time, pg_stat_get_archive_ready() as ready_count, s.stats_reset FROM pg_stat_get_archiver() s; The function pg_stat_get_archive_ready_count() will also be available for any other querying. -- Gilles Darold http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On 07/31/2014 12:29 AM, Thomas Munro wrote: On 29 July 2014 02:35, Alvaro Herrera alvhe...@2ndquadrant.com wrote: David Rowley wrote: I've also been looking at the isolation tests and I see that you've added a series of tests for NOWAIT. I was wondering why you did that as that's really existing code, probably if you thought the tests were a bit thin around NOWAIT then maybe that should be a separate patch? The isolation tester is new so we don't have nearly enough tests for it. Adding more meaningful tests is good even if they're unrelated to the patch at hand. Here are my isolation tests for NOWAIT as a separate patch, independent of SKIP LOCKED. They cover the tuple lock, regular row lock and multixact row lock cases. Thanks, committed. I guess this might be called white box testing, since it usese knowledge of how to construct schedules that hit the three interesting code paths that trigger the error, even though you can't see from the output why the error was raised in each case without extra instrumentation (though it did cross my mind that it could be interesting at the very least for testing if the error message were different in each case). Yeah, seems reasonable. This kind of tests might become obsolete in the future if the internals change a lot, so that e.g. we don't use multixids anymore. But even if that happens, there's little harm in keeping the tests, and meanwhile it's good to have coverage of these cases. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving speed of make check-world
On 08/15/2014 08:45 AM, Peter Eisentraut wrote: make check-world creates a temporary installation in every subdirectory it runs a test in, which is stupid: it's very slow and uses a lot of disk space. It's enough to do this once per run. That is the essence of what I have implemented. It cuts the time for make check-world in half or less, and it saves gigabytes of disk space. Nice! The idea is that we only maintain one temporary installation under the top-level directory. By looking at the variable MAKELEVEL, we can tell whether we are the top-level make invocation. If so, make a temp installation. If not, we know that the upper layers have already done it and we can just point to the existing temp installation. I do this by ripping out the temp installation logic from pg_regress and implementing it directly in the makefiles. This is much simpler and has additional potential benefits: The pg_regress temp install mode is actually a combination of two functionalities: temp install plus temp instance. Until now, you could only get the two together, but the temp instance functionality is actually quite useful by itself. It opens up the possibility of implementing make check for external pgxs modules, for example. Also, you could now run the temp installation step using parallel make, making it even faster. This was previously disabled because the make flags would have to pass through pg_regress. It still won't quite work to run make check-world -j8, say, because we can't actually run the tests in parallel (yet), but something like make -C src/test/regress check -j8 should work. To that end, I have renamed the pg_regress --temp-install option to --temp-instance. Since --temp-install is only used inside the source tree, this shouldn't cause any compatibility problems. Yeah, that all makes a lot of sense. The new EXTRA_INSTALL makefile variable ought to be documented in extend.sgml, where we list REGRESS_OPTS and others. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Fabrízio de Royes Mello wrote: On Sat, Aug 23, 2014 at 8:53 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Aug 23, 2014 at 3:32 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Great. Pushed. Thanks for the patch. There is a typo in what has been pushed. Patch attached. Thanks... I fixed that in my last patch to change 'loggedness' to 'persistence'. Attached. Thanks, pushed. I also added a comment explaining why it's okay to do what we're doing. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specifying the unit in storage parameter
On Thu, Aug 21, 2014 at 4:20 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Aug 8, 2014 at 12:32 PM, Fujii Masao masao.fu...@gmail.com wrote: This is not user-friendly. I'd like to propose the attached patch which introduces the infrastructure which allows us to specify the unit when setting INTEGER storage parameter like autovacuum_vacuum_cost_delay. Comment? Review? This patch makes autovacuum_vacuum_cost_delay more consistent with what is at server level. So +1. Thanks for reviewing the patch! Looking at the patch, the parameter fillfactor in the category RELOPT_KIND_HEAP (the first element in intRelOpts of reloptions.c) is not updated with the new field. It is only a one-line change. @@ -97,7 +97,7 @@ static relopt_int intRelOpts[] = Packs table pages only to this percentage, RELOPT_KIND_HEAP }, - HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100 + HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0 }, Oh, good catch. I wonder why I did such a mistake... Attached is the updated version of the patch. Except that, I tested as well the patch and it works as expected. The documentation, as well as the regression tests remain untouched, but I guess that this is fine (not seeing similar tests in regressions, and documentation does not specify the unit for a given parameter). I think that it's worth adding the regression test for this feature. Attached patch updates the regression test. Regards, -- Fujii Masao *** a/src/backend/access/common/reloptions.c --- b/src/backend/access/common/reloptions.c *** *** 97,103 static relopt_int intRelOpts[] = Packs table pages only to this percentage, RELOPT_KIND_HEAP }, ! HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100 }, { { --- 97,103 Packs table pages only to this percentage, RELOPT_KIND_HEAP }, ! HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0 }, { { *** *** 105,111 static relopt_int intRelOpts[] = Packs btree index pages only to this percentage, RELOPT_KIND_BTREE }, ! BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100 }, { { --- 105,111 Packs btree index pages only to this percentage, RELOPT_KIND_BTREE }, ! BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100, 0 }, { { *** *** 113,119 static relopt_int intRelOpts[] = Packs hash index pages only to this percentage, RELOPT_KIND_HASH }, ! HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100 }, { { --- 113,119 Packs hash index pages only to this percentage, RELOPT_KIND_HASH }, ! HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100, 0 }, { { *** *** 121,127 static relopt_int intRelOpts[] = Packs gist index pages only to this percentage, RELOPT_KIND_GIST }, ! GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100 }, { { --- 121,127 Packs gist index pages only to this percentage, RELOPT_KIND_GIST }, ! GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100, 0 }, { { *** *** 129,135 static relopt_int intRelOpts[] = Packs spgist index pages only to this percentage, RELOPT_KIND_SPGIST }, ! SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100 }, { { --- 129,135 Packs spgist index pages only to this percentage, RELOPT_KIND_SPGIST }, ! SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100, 0 }, { { *** *** 137,143 static relopt_int intRelOpts[] = Minimum number of tuple updates or deletes prior to vacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, INT_MAX }, { { --- 137,143 Minimum number of tuple updates or deletes prior to vacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, INT_MAX, 0 }, { { *** *** 145,151 static relopt_int intRelOpts[] = Minimum number of tuple inserts, updates or deletes prior to analyze, RELOPT_KIND_HEAP }, ! -1, 0, INT_MAX }, { { --- 145,151 Minimum number of tuple inserts, updates or deletes prior to analyze, RELOPT_KIND_HEAP }, ! -1, 0, INT_MAX, 0 }, { { *** *** 153,159 static relopt_int intRelOpts[] = Vacuum cost delay in milliseconds, for autovacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 100 }, { { --- 153,159 Vacuum cost delay in milliseconds, for autovacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 100, GUC_UNIT_MS }, { { *** *** 161,167 static relopt_int intRelOpts[] = Vacuum cost amount available before napping, for autovacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 1, 1 }, { { --- 161,167
Re: [HACKERS] Hardening pg_upgrade
On Mon, Aug 25, 2014 at 06:34:12AM -0700, Kevin Grittner wrote: Bernd Helmle maili...@oopsware.de wrote: Magnus Hagander mag...@hagander.net wrote: I vote for discarding 8.3 support in pg_upgrade. There are already enough limitations on pg_upgrade from pre-8.4 to make it of questionable value; if it's going to create problems like this, it's time to cut the rope. +1. 8.3 has been unsupported for a fairly long time now, and you can still do a two-step upgrade if you're on that old a version. Also +1 from my side. I've seen some old 8.3 installations at customers, still, but they aren't large and can easily be upgraded with a two step upgrade. +1 If we could leave it without it being any extra work, fine; but once a release is over a year out of support, if it's a matter of putting extra work on the pg hackers or on the users who have chosen to wait more than a year after support ends to do the upgrade, I'm OK with asking those users to do a two-phase upgrade or fall back to pg_dump. It's not like we're leaving them without any options. OK, I will move in the direction of removing 8.3 support and use a single query to pull schema information. I was hesistant to remove 8.3 support as I know we have kept pg_dump support all the way back to 7.0, but it seems pg_upgrade need not have the same version requirements. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch versus \timing
On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/18/2014 10:51 AM, Michael Paquier wrote: On Mon, Aug 18, 2014 at 4:12 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 18, 2014 at 3:19 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Aug 14, 2014 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: Attached patch changes \watch so that it displays how long the query takes if \timing is enabled. I didn't refactor PSQLexec and SendQuery into one routine because the contents of those functions are not so same. I'm not sure how much it's worth doing that refactoring. Anyway this feature is quite useful even without that refactoring, I think. The patch applies correctly and it does correctly what it is made for: =# \timing Timing is on. =# select 1; ?column? -- 1 (1 row) Time: 0.407 ms =# \watch 1 Watch every 1sMon Aug 18 15:17:41 2014 ?column? -- 1 (1 row) Time: 0.397 ms Watch every 1sMon Aug 18 15:17:42 2014 ?column? -- 1 (1 row) Time: 0.615 ms Refactoring it would be worth it thinking long-term... And printing the timing in PSQLexec code path is already done in SendQuery, so that's doing two times the same thing IMHO. Now, looking at the patch, introducing the new function PSQLexecInternal with an additional parameter to control the timing is correct choosing the non-refactoring way of doing. But I don't think that printing the time outside PSQLexecInternal is consistent with SendQuery. Why not simply control the timing with a boolean flag and print the timing directly in PSQLexecInternal? Because the timing needs to be printed after the query result. Thanks for pointing that. Yes this makes the refactoring a bit more difficult. Michael reviewed this, so I'm marking this as Ready for Committer. Since you're a committer yourself, I expect you'll take it over from here. Yep! I agree that refactoring this would be nice in the long-term, and I also agree that it's probably OK as it is in the short-term. I don't like the name PSQLexecInternal, though. PSQLexec is used for internal commands anyway. In fact it's backwards, because PSQLexecInternal is used for non-internal queries, given by \watch, while PSQLexec is used for internal commands. Agreed. So what about PSQLexecCommon (inspired by the relation between LWLockAcquireCommon and LWLockAcquire)? Or any better name? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On Sun, Aug 24, 2014 at 11:48 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Hi Rukh, (2014/08/15 6:18), Rukh Meski wrote: Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. Before looking into the patch, I'd like to know the use cases in more details. There are two common use cases I can think of: 1) I've just added a column to an existing table, and it is all NULL. I've changed the code to populate that column appropriately for new or updated rows, but I need to back fill the existing rows. I have a (slow) method to compute the new value. (I've not yet changed the code to depend on that column being populated) The obvious solution is: update the_table set new_col=populate_new_col(whatever) where new_col is null. But this will bloat the table because vacuum cannot intervene, and will take a very long time. The first row to be update will remain locked until the last row gets updated, which is not acceptable. And if something goes wrong before the commit, you've lost all the work. With the limit clause, you can just do this: update the_table set new_col=populate_new_col(whatever) where new_col is null limit 5; In a loop with appropriate vacuuming and throttling. 2) I've introduced or re-designed partitioning, and need to migrate rows to the appropriate partitions without long lived row locks. create table pgbench_accounts2 () inherits (pgbench_accounts); and then in a loop: with t as (delete from only pgbench_accounts where aid 50 limit 5000 returning *) insert into pgbench_accounts2 select * from t; Cheers, Jeff
Re: [HACKERS] Specifying the unit in storage parameter
Fujii Masao wrote: On Thu, Aug 21, 2014 at 4:20 PM, Michael Paquier michael.paqu...@gmail.com wrote: Looking at the patch, the parameter fillfactor in the category RELOPT_KIND_HEAP (the first element in intRelOpts of reloptions.c) is not updated with the new field. It is only a one-line change. @@ -97,7 +97,7 @@ static relopt_int intRelOpts[] = Packs table pages only to this percentage, RELOPT_KIND_HEAP }, - HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100 + HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0 }, Oh, good catch. I wonder why I did such a mistake... Uninitialized elements at end of struct are filled with zeroes. We do have other examples of this -- for instance, config_generic in the guc.c tables are almost always only 5 members long even though the struct is quite a bit longer than that. Most entries do not even have flags set. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On Mon, Aug 25, 2014 at 2:55 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fabrízio de Royes Mello wrote: On Sat, Aug 23, 2014 at 8:53 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Aug 23, 2014 at 3:32 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Great. Pushed. Thanks for the patch. There is a typo in what has been pushed. Patch attached. Thanks... I fixed that in my last patch to change 'loggedness' to 'persistence'. Attached. Thanks, pushed. I also added a comment explaining why it's okay to do what we're doing. Thanks... I'm working on a refactoring to pass down the relpersistence flag to finish_heap_swap... is this valid for now or I should leave it to another patch? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Verbose output of pg_dump not show schema name
On 08/20/2014 11:11 PM, Fabrízio de Royes Mello wrote: On Wed, Aug 20, 2014 at 2:43 AM, Michael Paquier michael.paqu...@gmail.com wrote: I had a look at this patch, and here are a couple of comments: 1) Depending on how ArchiveEntry is called to register an object to dump, namespace may be NULL, but it is not the case namespace-dobj.name, so you could get the namespace name at the top of the function that have their verbose output improved with something like that: const char *namespace = tbinfo-dobj.namespace ? tbinfo-dobj.namespace-dobj.name : NULL; And then simplify the message output as follows: if (namespace) write_msg(blah \%s\.\%s\ blah, namespace, classname); else write_msg(blah \%s\ blah, classname); You can as well safely remove the checks on namespace-dobj.name. Ok AFAICS, the namespace can never be NULL in any of these. There is a selectSourceSchema(fout, tbinfo-dobj.namespace-dobj.name) call before or after printing the message, so if tbinfo-dobj.namespace is NULL, you'll crash anyway. Please double-check, and remove the dead code if you agree. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Aug 25, 2014 at 3:48 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Attached WIP patch adds -C (--concurrently) option for reindexdb command for concurrently reindexing. If we specify -C option with any table then reindexdb do reindexing concurrently with minimum lock necessary. Note that we cannot use '-s' option (for system catalog) and '-C' option at the same time. This patch use simple method as follows. 1. Do CREATE INDEX CONCURRENTLY new index which has same definition as target index 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts) 3. Swap old and new index 4. Drop old index 5. COMMIT These process are based on pg_repack(or pg_reorg) does, done via SQL. +1. I have some shell scripts which do that reindex technique, and I'd be happy if I can replace them with this feature. Can this technique reindex the primary key index and the index which other objects depend on (e.g., foreign key)? This would be a useful for users, but I am not sure that you can call that --concurrently as the rename/swap phase requires an exclusive lock, and you would actually block a real implementation of REINDEX CONCURRENTLY (hm...). this might be difficult to call this as --concurrently. It might need to be change the name. I'm OK to say that as --concurrently if the document clearly explains that restriction. Or --almost-concurrently? ;P ToDo - Multi language support for log message. Why? I am not sure that's something you should deal with. The log message which has been existed already are supported multi language support using by .po file, But newly added message has not corresponded message in .po file, I thought. I don't think that you need to add the update of .po file into the feature patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
Bruce Momjian br...@momjian.us writes: OK, I will move in the direction of removing 8.3 support and use a single query to pull schema information. I was hesistant to remove 8.3 support as I know we have kept pg_dump support all the way back to 7.0, but it seems pg_upgrade need not have the same version requirements. Not really related, but ... I've been thinking that it's time to rip out pg_dump's support for server versions before 7.3 or 7.4. That would let us get rid of a lot of klugy code associated with the lack of schemas and dependency info in the older versions. It's possible that we should move the cutoff even further --- I've not looked closely at how much could be removed by dropping versions later than 7.3. Aside from the question of how much old code could be removed, there's the salient point of how do we test pg_dump against such old branches? The further back you go the harder it is to even build PG on modern platforms, and the less likely it will work (I note for example that pre-8.0 configure doesn't try to use -fwrapv, let alone some of the other switches we've found necessary on recent gcc). I've usually tested pg_dump patches against old servers by running them against builds I have in captivity on my old HPPA box ... but once that dies, I'm *not* looking forward to trying to rebuild 7.x on my current machines. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 06/18/2014 09:17 PM, Josh Berkus wrote: On 06/18/2014 11:03 AM, Andres Freund wrote: Well, all those actually do write to the xlog (to write a new checkpoint, containing the updated control file). Since pg_resetxlog has done all this pretty much since forever renaming it now seems to be a big hassle for users for pretty much no benefit? This isn't a tool the average user should ever touch. If we're using it to create a unique system ID which can be used to orchestrate replication and clustering systems, a lot more people are going to be touching it than ever did before -- and not just for BDR. I think pg_resetxlog is still appropriate: changing the system ID will reset the WAL. In particular, any archived WAL will become useless. But yeah, this does change the target audience of pg_resetxlog significantly. Now that we'll have admins running pg_resetxlog as part of normal operations, we have to document it carefully. We also have to design the user interface carefully, to make it more clear that while resetting the system ID won't eat your data, some of the other settings will. The proposed pg_resetxlog --help output looks like this: pg_resetxlog resets the PostgreSQL transaction log. Usage: pg_resetxlog [OPTION]... DATADIR Options: -e XIDEPOCH set next transaction ID epoch -f force update to be done -l XLOGFILE force minimum WAL starting location for new transaction log -m MXID,MXID set next and oldest multitransaction ID -n no update, just show what would be done (for testing) -o OID set next OID -O OFFSETset next multitransaction offset -s [SYSID] set system identifier (or generate one) -V, --versionoutput version information, then exit -x XID set next transaction ID -?, --help show this help, then exit Report bugs to pgsql-b...@postgresql.org. I don't think that's good enough. The -s option should be displayed more prominently, and the dangerous options like -l and -x should be more clearly labeled as such. I would de-emphasize setting the system ID to a particular value. It might be useful for disaster recovery, like -x, but in general you should always reset it to a new unique value. If you make it too easy to set it to a particular value, someone will try initialize a streaming standby server using initdb+pg_dump, and changing the system ID to match the master's. The user manual for pg_resetxlog says: pg_resetxlog clears the write-ahead log (WAL) and optionally resets some other control information stored in the pg_control file. This function is sometimes needed if these files have become corrupted. It should be used only as a last resort, when the server will not start due to such corruption. That's clearly not true for the -s option. In summary, I think we want this feature in some form, but we'll somehow need to be make the distinction to the dangerous pg_resetxlog usage. It might be best, after all, to make this a separate utility, pg_resetsystemid. It would not need to have the capability to set the system ID to a particular value, only a randomly assigned one (setting it to a particular value could be added to pg_resetxlog, where other dangerous options are). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
On Mon, Aug 25, 2014 at 03:04:52PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: OK, I will move in the direction of removing 8.3 support and use a single query to pull schema information. I was hesistant to remove 8.3 support as I know we have kept pg_dump support all the way back to 7.0, but it seems pg_upgrade need not have the same version requirements. Not really related, but ... I've been thinking that it's time to rip out pg_dump's support for server versions before 7.3 or 7.4. That would let us get rid of a lot of klugy code associated with the lack of schemas and dependency info in the older versions. It's possible that we should move the cutoff even further --- I've not looked closely at how much could be removed by dropping versions later than 7.3. Yeah, it kind of is related, as that was the logic I followed originally for pg_upgrade, i.e. never remove supported versions --- that has been overridden. Aside from the question of how much old code could be removed, there's the salient point of how do we test pg_dump against such old branches? The further back you go the harder it is to even build PG on modern platforms, and the less likely it will work (I note for example that pre-8.0 configure doesn't try to use -fwrapv, let alone some of the other switches we've found necessary on recent gcc). I've usually tested pg_dump patches against old servers by running them against builds I have in captivity on my old HPPA box ... but once that dies, I'm *not* looking forward to trying to rebuild 7.x on my current machines. Yes. You could argue that a double-upgrade from 7.0 to 8.0 to 9.4 would be less buggy than one from 7.0 to 9.4. I agree there is almost zero testing of very old versions. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
On Thu, Aug 21, 2014 at 12:18:46PM -0400, Bruce Momjian wrote: I have developed the attached patch to address the issues raised above: o non-text output of pg_dump is mentioned o mentions of using OID for keys is removed o the necessity of pg_dumpall --globals-only is mentioned o using pg_dump parallel mode rather than pg_dumpall for upgrades is mentioned o pg_upgrade is mentioned more prominently for upgrades o replication upgrades are in their own section I don't think we want to mention pg_upgrade as the _primary_ major-version upgrade method. While the pg_dump upgrade section is long, it is mostly about starting/stoping the server, moving directories, etc, the same things you have to do for pg_upgrade, so I just mentioned that int the pg_upgrade section. Other ideas? I plan to apply this to head and 9.4. Updated patch attached and applied. Any other suggestions? Please let me know. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml new file mode 100644 index 06f064e..07ca0dc *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *** *** 28,34 titleacronymSQL/ Dump/title para !The idea behind this dump method is to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. productnamePostgreSQL/ provides the utility program --- 28,34 titleacronymSQL/ Dump/title para !The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. productnamePostgreSQL/ provides the utility program *** pg_dump replaceable class=parameterd *** 39,44 --- 39,47 /synopsis As you see, applicationpg_dump/ writes its result to the standard output. We will see below how this can be useful. +While the above command creates a text file, applicationpg_dump/ +can create files in other formats that allow for parallism and more +fine-grained control of object restoration. /para para *** pg_dump replaceable class=parameterd *** 98,117 exclusive lock, such as most forms of commandALTER TABLE/command.) /para - important -para - If your database schema relies on OIDs (for instance, as foreign - keys) you must instruct applicationpg_dump/ to dump the OIDs - as well. To do this, use the option-o/option command-line - option. -/para - /important - sect2 id=backup-dump-restore titleRestoring the Dump/title para ! The text files created by applicationpg_dump/ are intended to be read in by the applicationpsql/application program. The general command form to restore a dump is synopsis --- 101,111 exclusive lock, such as most forms of commandALTER TABLE/command.) /para sect2 id=backup-dump-restore titleRestoring the Dump/title para ! Text files created by applicationpg_dump/ are intended to be read in by the applicationpsql/application program. The general command form to restore a dump is synopsis *** psql replaceable class=parameterdbna *** 127,132 --- 121,128 supports options similar to applicationpg_dump/ for specifying the database server to connect to and the user name to use. See the xref linkend=app-psql reference page for more information. + Non-text file dumps are restored using the xref + linkend=app-pgrestore utility. /para para *** psql -f replaceable class=parameteri *** 225,231 roles, tablespaces, and empty databases, then invoking applicationpg_dump/ for each database. This means that while each database will be internally consistent, the snapshots of ! different databases might not be exactly in-sync. /para /sect2 --- 221,234 roles, tablespaces, and empty databases, then invoking applicationpg_dump/ for each database. This means that while each database will be internally consistent, the snapshots of ! different databases are not sychronized. !/para ! !para ! Cluster-wide data can be dumped alone using the ! applicationpg_dumpall/ option--globals-only/ option. ! This is necessary to fully backup the cluster if running the ! applicationpg_dump/ command on individual databases. /para /sect2 diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml new file mode 100644 index 1d91d92..f337485 *** a/doc/src/sgml/runtime.sgml --- b/doc/src/sgml/runtime.sgml *** $ userinputkill -INT `head -1 /usr/loc *** 1517,1524 For
Re: [HACKERS] psql \watch versus \timing
On 08/25/2014 09:22 PM, Fujii Masao wrote: On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I agree that refactoring this would be nice in the long-term, and I also agree that it's probably OK as it is in the short-term. I don't like the name PSQLexecInternal, though. PSQLexec is used for internal commands anyway. In fact it's backwards, because PSQLexecInternal is used for non-internal queries, given by \watch, while PSQLexec is used for internal commands. Agreed. So what about PSQLexecCommon (inspired by the relation between LWLockAcquireCommon and LWLockAcquire)? Or any better name? Actually, perhaps it would be better to just copy-paste PSQLexec, and modify the copy to suite \watch's needs. (PSQLexecWatch? SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much overlap between what \watch wants and what other PSQLexec callers want. \watch wants timing output, others don't. \watch doesn't want transaction handling. Do we want --echo-hidden to print the \watch'd query? Not sure.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
Heikki Linnakangas hlinnakan...@vmware.com writes: In summary, I think we want this feature in some form, but we'll somehow need to be make the distinction to the dangerous pg_resetxlog usage. It might be best, after all, to make this a separate utility, pg_resetsystemid. That sounds fairly reasonable given your point about not wanting people to confuse this with the can-eat-your-data aspects of pg_resetxlog. (OTOH, won't this result in a lot of code duplication? We'd still need to erase and refill the WAL area.) It would not need to have the capability to set the system ID to a particular value, only a randomly assigned one (setting it to a particular value could be added to pg_resetxlog, where other dangerous options are). I'm less convinced about that. While you can shoot yourself in the foot by assigning the same system ID to two installations that share WAL archive or something like that, this feels a bit different than the ways you can shoot yourself in the foot with pg_resetxlog. If we do what you say here then I think we'll be right back to the discussion of how to separate the assign-a-sysID option from pg_resetxlog's other, more dangerous options. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On August 25, 2014 9:45:50 PM CEST, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: In summary, I think we want this feature in some form, but we'll somehow need to be make the distinction to the dangerous pg_resetxlog usage. It might be best, after all, to make this a separate utility, pg_resetsystemid. That sounds fairly reasonable given your point about not wanting people to confuse this with the can-eat-your-data aspects of pg_resetxlog. (OTOH, won't this result in a lot of code duplication? We'd still need to erase and refill the WAL area.) Let's move pg-controldata, resetxlog, resetsysid into a common src/bin directory. Then we can unify the relevant code between all three. -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \watch versus \timing
On 08/25/2014 10:48 PM, Heikki Linnakangas wrote: On 08/25/2014 09:22 PM, Fujii Masao wrote: On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I agree that refactoring this would be nice in the long-term, and I also agree that it's probably OK as it is in the short-term. I don't like the name PSQLexecInternal, though. PSQLexec is used for internal commands anyway. In fact it's backwards, because PSQLexecInternal is used for non-internal queries, given by \watch, while PSQLexec is used for internal commands. Agreed. So what about PSQLexecCommon (inspired by the relation between LWLockAcquireCommon and LWLockAcquire)? Or any better name? Actually, perhaps it would be better to just copy-paste PSQLexec, and modify the copy to suite \watch's needs. (PSQLexecWatch? SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much overlap between what \watch wants and what other PSQLexec callers want. \watch wants timing output, others don't. \watch doesn't want transaction handling. Do we want --echo-hidden to print the \watch'd query? Not sure.. BTW, I just noticed that none of the callers of PSQLexec pass start_xact=true. So that part of the function is dead code. We might want to remove it, and replace with a comment noting that PSQLexec never starts a new transaction block, even in autocommit-off mode. (I know you're hacking on this, so I didnn't want to joggle your elbow by doing it right now) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 08/25/2014 10:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: It would not need to have the capability to set the system ID to a particular value, only a randomly assigned one (setting it to a particular value could be added to pg_resetxlog, where other dangerous options are). I'm less convinced about that. While you can shoot yourself in the foot by assigning the same system ID to two installations that share WAL archive or something like that, this feels a bit different than the ways you can shoot yourself in the foot with pg_resetxlog. If we do what you say here then I think we'll be right back to the discussion of how to separate the assign-a-sysID option from pg_resetxlog's other, more dangerous options. I don't see the use case for setting system id to a particular value. Andres listed four use cases upthread: a) Mark a database as not being the same. Currently if you promote two databases, e.g. to shard out, they'll continue to have same system identifier. Which really sucks, especially because timeline ids will often increase synchronously. Yes, this is the legitimate use case a DBA would use this feature for. Resetting the system ID to a random value suffices. b) For data recovery it's sometimes useful to create a new database (with the same catalog state) and replay all WAL. For that you need to reset the system identifier. I've done so hacking up resetxlog before. This falls squarely in the dangerous category, and you'll have to reset other things than the system ID to make it work. Having the option in pg_resetxlog is fine for this. c) We already allow to set pretty much all aspects of the control file via resetxlog - there seems little point of not having the ability to change the system identifier. Ok, but it's not something a regular admin would ever use. d) In a logical replication scenario one way to identify individual nodes is via the system identifier. If you want to convert a basebackup into logical standby one sensible way to do so is to create a logical replication slots *before* promoting a physical backup to guarantee that slot is able to stream out all changes. If the slot names contain the consumer's system identifier you need to know the new system identifier beforehand. I didn't understand this one. But it seems like the obvious solution is to not use the consumer's system identifier as the slot name. Or rename it afterwards. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: this might be difficult to call this as --concurrently. It might need to be change the name. I'm OK to say that as --concurrently if the document clearly explains that restriction. Or --almost-concurrently? ;P By reading that I am thinking as well about a wording with lock, like --minimum-locks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Here's a refreshed version of this patch. I have split it up in a largish number of pieces, which hopefully makes it easier to understand what is going on. Alvaro, Could you confirm that the patches you just committed are 1, 3 and 6? Regards, -- Michael
[HACKERS] postgresql latency bgwriter not doing its job
Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD attached, pgbench simple update script run for a some time (scale 100, fillfactor 95) pgbench -M prepared -N -c 2 -T 500 -P 1 ... gives 300 tps. However this performance is really +1000 tps for a few seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above max): pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ... The same thing more or less happens in a delayed fashion... You get 50 tps for some time, followed by sections of 15 seconds at 0 tps for the checkpoint when the segments are full... the server is unresponsive about 10% of the time (one in ten transaction is late by more than 200 ms). It is not satisfying, pg should be able to handle that load easily. The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink. Indeed, at the end of the 500 seconds throttled test, pg_stat_bgwriter says: buffers_checkpoint = 19046 buffers_clean = 2995 Which suggest that bgwriter took on him to write only 6 pages per second, where at least 50 would have been needed for the load, and could have been handled by the harware without any problem. I have not found any mean to force bgwriter to send writes when it can. (Well, I have: create a process which sends CHECKPOINT every 0.2 seconds... it works more or less, but this is not my point:-) Bgwriter control parameters allow to control the maximum number of pages (bgwriter_lru_maxpages) written per round (bgwriter_delay), and a multiplier (bgwriter_lru_multiplier) which controls some heuristics to estimate how many pages should be needed so as to make them available. This may be nice in some settings, but is not adapted to the write oriented OTPL load tested with pgbench. The problem is that with the update load on a fitting in memory database there is not that much need of new pages, even if pages are being dirtied (about 50 per seconds), so it seems that the heuristics decides not to write much. The net result of all this cleverness is that when the checkpoint arrives, several thousand pages have to be written and the server is offline for some time. ISTM that bgwriter lacks at least some min page setting it could be induced to free this many pages if it can. That would be a start. A better feature would be that it adapts itself to take advantage of the available IOPS, depending on the load induce by other tasks (vacuum, queries...), in a preventive manner, so as to avoid delaying what can be done right now under a small load, and thus avoid later IO storms. This would suggest that some setting would provide the expected IOPS capability of the underlying hardware, as some already suggest the expected available memory. Note that this preventive approach could also improve the bandwith measure: currently when pgbench is running at maximum speed before the checkpoint storm, nothing is written to disk but WAL, although it could probably also write some dirty pages. When the checkpoints arrives, less pages would need to be written, so the storm would be shorter. Any thoughts on this latency issue? Am I missing something? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: allow multiple -o/-O options
On Fri, Aug 22, 2014 at 10:02:11AM -0400, Bruce Momjian wrote: On Fri, Aug 22, 2014 at 10:52:12AM +0200, Pavel Raiskup wrote: On Thursday 21 of August 2014 18:26:37 Bruce Momjian wrote: On Tue, Mar 4, 2014 at 04:52:56PM +0100, Pavel Raiskup wrote: RFE: Consider that you want to run pg_upgrade via some script with some default '-o' option. But then you also want to give the script's user a chance to specify the old-server's options according user's needs. Then something like the following is not possible: $ cat script ... pg_upgrade ... -o 'sth' $PG_UPGRADE_OPT ... ... I know that this problem is still script-able, but the fix should be innocent and it would simplify things. Thanks for considering, Attached is a patch that makes multiple -o options append their arguments for pg_upgrade and pg_ctl, and documents this and the append behavior of postmaster/postgres. This covers all the -o behaviors. Thanks! Seems to be OK to me, one nit - why you did not go the append_optiton way (there could be probably better name like arg_cat)? Because this is just about few lines, it is probably OK from PostgreSQL policy POV, so review? ~ review+, thanks again! Well, I found append_optiton() to be an extra function that wasn't necessary --- the psprintf() use was short enough not to need a separate function. Patch applied; this will appear in PG 9.5. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
Michael Paquier wrote: On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Here's a refreshed version of this patch. I have split it up in a largish number of pieces, which hopefully makes it easier to understand what is going on. Alvaro, Could you confirm that the patches you just committed are 1, 3 and 6? And 4. Yes, they are. I wanted to get trivial stuff out of the way while I had some other trivial patch at hand. I'm dealing with another patch from the commitfest now, so I'm not posting a rebased version right away, apologies. How do people like this patch series? It would be much easier for me to submit a single patch, but I feel handing it in little pieces makes it easier for reviewers. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
Michael Paquier wrote: On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: this might be difficult to call this as --concurrently. It might need to be change the name. I'm OK to say that as --concurrently if the document clearly explains that restriction. Or --almost-concurrently? ;P By reading that I am thinking as well about a wording with lock, like --minimum-locks. Why not just finish up the REINDEX CONCURRENTLY patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On August 25, 2014 10:35:20 PM CEST, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: this might be difficult to call this as --concurrently. It might need to be change the name. I'm OK to say that as --concurrently if the document clearly explains that restriction. Or --almost-concurrently? ;P By reading that I am thinking as well about a wording with lock, like --minimum-locks. Why not just finish up the REINDEX CONCURRENTLY patch. +many. Although I'm not sure if we managed to find a safe relation swap. If not: How about adding ALTER INDEX ... SWAP which requires an exclusive lock but is fast and O(1)? Than all indexes can be created concurrently, swapped in a very short xact, and then dropped concurrently? 95% of all users would be happy with that and the remaining 5 would still be in a better position than today where the catalog needs to be hacked for that (fkeys, pkeys et al). --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Final Patch for GROUPING SETS
Hi I checked this patch, and it working very well I found only two issue - I am not sure if it is issue with data from https://wiki.postgresql.org/wiki/Grouping_Sets postgres=# select name, place, sum(count), grouping(name), grouping(place) from cars group by rollup(name, place); name | place| sum | grouping | grouping ---++---+--+-- bmw | czech rep. | 100 |0 |0 bmw | germany| 1000 |0 |0 bmw || 1100 |0 |1 opel | czech rep. | 7000 |0 |0 opel | germany| 7000 |0 |0 opel || 14000 |0 |1 skoda | czech rep. | 1 |0 |0 skoda | germany| 5000 |0 |0 skoda || 15000 |0 |1 || 30100 |1 |1 (10 rows) * redundant sets should be ignored postgres=# select name, place, sum(count), grouping(name), grouping(place) from cars group by rollup(name, place), name; name | place| sum | grouping | grouping ---++---+--+-- bmw | czech rep. | 100 |0 |0 bmw | germany| 1000 |0 |0 bmw || 1100 |0 |1 bmw || 1100 |0 |1 opel | czech rep. | 7000 |0 |0 opel | germany| 7000 |0 |0 opel || 14000 |0 |1 opel || 14000 |0 |1 skoda | czech rep. | 1 |0 |0 skoda | germany| 5000 |0 |0 skoda || 15000 |0 |1 skoda || 15000 |0 |1 (12 rows) It duplicate rows postgres=# explain select name, place, sum(count), grouping(name), grouping(place) from cars group by rollup(name, place), name; QUERY PLAN GroupAggregate (cost=101.14..101.38 rows=18 width=68) Grouping Sets: (name, place), (name), (name) - Sort (cost=101.14..101.15 rows=6 width=68) Sort Key: name, place - Seq Scan on cars (cost=0.00..1.06 rows=6 width=68) Planning time: 0.235 ms (6 rows) postgres=# select name, place, sum(count), grouping(name), grouping(place) from cars group by grouping sets((name, place), (name), (name),(place), ()); name | place| sum | grouping | grouping ---++---+--+-- bmw | czech rep. | 100 |0 |0 bmw | germany| 1000 |0 |0 bmw || 1100 |0 |1 bmw || 1100 |0 |1 opel | czech rep. | 7000 |0 |0 opel | germany| 7000 |0 |0 opel || 14000 |0 |1 opel || 14000 |0 |1 skoda | czech rep. | 1 |0 |0 skoda | germany| 5000 |0 |0 skoda || 15000 |0 |1 skoda || 15000 |0 |1 || 30100 |1 |1 | czech rep. | 17100 |1 |0 | germany| 13000 |1 |0 (15 rows) Fantastic work Regards Pavel 2014-08-25 7:21 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk: Here is the new version of our grouping sets patch. This version supersedes the previous post. We believe the functionality of this version to be substantially complete, providing all the standard grouping set features except T434 (GROUP BY DISTINCT). (Additional tweaks, such as extra variants on GROUPING(), could be added for compatibility with other databases.) Since the debate regarding reserved keywords has not produced any useful answer, the main patch here makes CUBE and ROLLUP into col_name_reserved keywords, but a separate small patch is attached to make them unreserved_keywords instead. So there are now 5 files: gsp1.patch - phase 1 code patch (full syntax, limited functionality) gsp2.patch - phase 2 code patch (adds full functionality using the new chained aggregate mechanism) gsp-doc.patch - docs gsp-contrib.patch - quote cube in contrib/cube and contrib/earthdistance, intended primarily for testing pending a decision on renaming contrib/cube or unreserving keywords gsp-u.patch- proposed method to unreserve CUBE and ROLLUP -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is this a bug?
On Fri, Aug 22, 2014 at 10:04:50PM -0400, Bruce Momjian wrote: On Fri, Aug 22, 2014 at 03:12:47PM -0400, Robert Haas wrote: On Fri, Aug 22, 2014 at 2:33 PM, Bruce Momjian br...@momjian.us wrote: Yes, you remember well. I will have to find a different way for pg_upgrade to call a no-op ALTER TABLE, which is fine. Looking at the ALTER TABLE options, I am going to put this check in a !IsBinaryUpgrade block so pg_upgrade can still use its trick. -1, that's really ugly. Maybe the right solution is to add a form of ALTER TABLE that is specifically defined to do only this check. This is an ongoing need, so that might not be out of line. Ah, seems ALTER TABLE ... DROP CONSTRAINT IF EXISTS also works --- I will use that. OK, attached patch applied, with pg_upgrade adjustments. I didn't think the original regression tests for this were necessary. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c new file mode 100644 index e623a22..29a68c0 *** a/contrib/pg_upgrade/dump.c --- b/contrib/pg_upgrade/dump.c *** optionally_create_toast_tables(void) *** 115,120 --- 115,124 c.relkind IN ('r', 'm') AND c.reltoastrelid = 0); + /* Suppress NOTICE output from non-existant constraints */ + PQclear(executeQueryOrDie(conn, SET client_min_messages = warning;)); + PQclear(executeQueryOrDie(conn, SET log_min_messages = warning;)); + ntups = PQntuples(res); i_nspname = PQfnumber(res, nspname); i_relname = PQfnumber(res, relname); *** optionally_create_toast_tables(void) *** 125,137 OPTIONALLY_CREATE_TOAST_OID)); /* dummy command that also triggers check for required TOAST table */ ! PQclear(executeQueryOrDie(conn, ALTER TABLE %s.%s RESET (binary_upgrade_dummy_option);, quote_identifier(PQgetvalue(res, rowno, i_nspname)), quote_identifier(PQgetvalue(res, rowno, i_relname; } PQclear(res); PQfinish(conn); } --- 129,144 OPTIONALLY_CREATE_TOAST_OID)); /* dummy command that also triggers check for required TOAST table */ ! PQclear(executeQueryOrDie(conn, ALTER TABLE %s.%s DROP CONSTRAINT IF EXISTS binary_upgrade_dummy_constraint;, quote_identifier(PQgetvalue(res, rowno, i_nspname)), quote_identifier(PQgetvalue(res, rowno, i_relname; } PQclear(res); + PQclear(executeQueryOrDie(conn, RESET client_min_messages;)); + PQclear(executeQueryOrDie(conn, RESET log_min_messages;)); + PQfinish(conn); } diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c new file mode 100644 index e0b81b9..97a4e22 *** a/src/backend/access/common/reloptions.c --- b/src/backend/access/common/reloptions.c *** static void initialize_reloptions(void); *** 307,312 --- 307,314 static void parse_one_reloption(relopt_value *option, char *text_str, int text_len, bool validate); + static bool is_valid_reloption(char *name); + /* * initialize_reloptions * initialization routine, must be called before parsing *** initialize_reloptions(void) *** 382,387 --- 384,408 } /* + * is_valid_reloption + * check if a reloption exists + * + */ + static bool + is_valid_reloption(char *name) + { + int i; + + for (i = 0; relOpts[i]; i++) + { + if (pg_strcasecmp(relOpts[i]-name, name) == 0) + return true; + } + + return false; + } + + /* * add_reloption_kind * Create a new relopt_kind value, to be used in custom reloptions by * user-defined AMs. *** transformRelOptions(Datum oldOptions, Li *** 672,677 --- 693,703 if (isReset) { + if (!is_valid_reloption(def-defname)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(unrecognized parameter \%s\, def-defname))); + if (def-arg != NULL) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On 25 August 2014 02:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Thomas Munro wrote: The difficulty of course will be testing all these racy cases reproducibly... Does this help? http://www.postgresql.org/message-id/51fb4305.3070...@2ndquadrant.com The useful trick there is forcing a query to get its snapshot and then go to sleep before actually doing anything, by way of an advisory lock. Yes it does, thanks Alvaro and Craig. I think the attached spec reproduces the problem using that trick, ie shows NOWAIT blocking, presumably in EvalPlanQualFetch (though I haven't stepped through it with a debugger yet). I'm afraid I'm out of Postgres hacking cycles for a few days, but next weekend I should have a new patch that fixes this by teaching EvalPlanQualFetch about wait policies, with isolation tests for NOWAIT and SKIP LOCKED. Best regards, Thomas Munro # Test NOWAIT with an updated tuple chain. setup { CREATE TABLE foo ( id int PRIMARY KEY, data text NOT NULL ); INSERT INTO foo VALUES (1, 'x'); } teardown { DROP TABLE foo; } session s1 setup { BEGIN; } step s1a { SELECT * FROM foo WHERE pg_advisory_lock(0) IS NOT NULL FOR UPDATE NOWAIT; } step s1b { COMMIT; } session s2 step s2a { SELECT pg_advisory_lock(0); } step s2b { UPDATE foo SET data = data; } step s2c { BEGIN; } step s2d { UPDATE foo SET data = data; } step s2e { SELECT pg_advisory_unlock(0); } step s2f { COMMIT; } permutation s2a s1a s2b s2c s2d s2e s1b s2f -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
On Mon, Aug 25, 2014 at 02:15:18PM -0400, Bruce Momjian wrote: On Mon, Aug 25, 2014 at 06:34:12AM -0700, Kevin Grittner wrote: Bernd Helmle maili...@oopsware.de wrote: Magnus Hagander mag...@hagander.net wrote: I vote for discarding 8.3 support in pg_upgrade. There are already enough limitations on pg_upgrade from pre-8.4 to make it of questionable value; if it's going to create problems like this, it's time to cut the rope. +1. 8.3 has been unsupported for a fairly long time now, and you can still do a two-step upgrade if you're on that old a version. Also +1 from my side. I've seen some old 8.3 installations at customers, still, but they aren't large and can easily be upgraded with a two step upgrade. +1 If we could leave it without it being any extra work, fine; but once a release is over a year out of support, if it's a matter of putting extra work on the pg hackers or on the users who have chosen to wait more than a year after support ends to do the upgrade, I'm OK with asking those users to do a two-phase upgrade or fall back to pg_dump. It's not like we're leaving them without any options. OK, I will move in the direction of removing 8.3 support and use a single query to pull schema information. I was hesistant to remove 8.3 support as I know we have kept pg_dump support all the way back to 7.0, but it seems pg_upgrade need not have the same version requirements. I will modify pg_upgrade in three steps: o remove 8.3 support o use a single CTE rather than use a temp table o harden the backend to prevent automatic oid assignment This is all for 9.5. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql latency bgwriter not doing its job
On 08/25/2014 01:23 PM, Fabien COELHO wrote: Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD attached, pgbench simple update script run for a some time (scale 100, fillfactor 95) pgbench -M prepared -N -c 2 -T 500 -P 1 ... gives 300 tps. However this performance is really +1000 tps for a few seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. So I think that you're confusing the roles of bgwriter vs. spread checkpoint. What you're experiencing above is pretty common for nonspread checkpoints on slow storage (and RAID5 is slow for DB updates, no matter how fast the disks are), or for attempts to do spread checkpoint on filesystems which don't support it (e.g. Ext3, HFS+). In either case, what's happening is that the *OS* is freezing all logical and physical IO while it works to write out all of RAM, which makes me suspect you're using Ext3 or HFS+. Making the bgwriter more aggressive adds a significant risk of writing the same pages multiple times between checkpoints, so it's not a simple fix. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql latency bgwriter not doing its job
Hi, On 2014-08-25 22:23:40 +0200, Fabien COELHO wrote: seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above max): pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ... The same thing more or less happens in a delayed fashion... You get 50 tps for some time, followed by sections of 15 seconds at 0 tps for the checkpoint when the segments are full... the server is unresponsive about 10% of the time (one in ten transaction is late by more than 200 ms). That's ext4 I guess? Did you check whether xfs yields a, err, more predictable performance? It is not satisfying, pg should be able to handle that load easily. The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink. While I agree that the current bgwriter implementation is far from good, note that this isn't the bgwriter's job. Its goal is to avoid backends from having to write out buffers themselves. I.e. that there are clean victim buffers when shared_buffers working set. Note that it would *not* be a good idea to make the bgwriter write out everything, as much as possible - that'd turn sequential write io into random write io. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Tue, Aug 26, 2014 at 5:33 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: And 4. Yes, they are. I wanted to get trivial stuff out of the way while I had some other trivial patch at hand. I'm dealing with another patch from the commitfest now, so I'm not posting a rebased version right away, apologies. No problems. I imagine that most of the patches still apply. How do people like this patch series? It would be much easier for me to submit a single patch, but I feel handing it in little pieces makes it easier for reviewers. Well, I like the patch series for what it counts as long as you can submit it as such. That's far easier to test and certainly helps in spotting issues when kicking different code paths. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add .NOTPARALLEL to contrib/Makefile
Hi, Currently running make -j16 all check in contrib/ results in a mess because all pg_regress invocations fight over the same port. Adding a simple .NOTPARALLEL: check-%-recurse into contrib/Makefile fixes that. Do we want that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile
Andres Freund and...@2ndquadrant.com writes: Currently running make -j16 all check in contrib/ results in a mess because all pg_regress invocations fight over the same port. Adding a simple .NOTPARALLEL: check-%-recurse into contrib/Makefile fixes that. Do we want that? Dunno, but if we do, it should be applied to installcheck as well. (In that case the fight is over the contrib_regression database.) A larger point is that you shouldn't really fix this only for contrib. What about tests run in src/pl, or contrib vs the main regression tests vs src/pl vs test/isolationtester, etc etc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Final Patch for GROUPING SETS
Pavel == Pavel Stehule pavel.steh...@gmail.com writes: Pavel Hi Pavel I checked this patch, and it working very well Pavel I found only two issue - I am not sure if it is issue Pavel It duplicate rows Pavel postgres=# explain select name, place, sum(count), grouping(name), Pavel grouping(place) from cars group by rollup(name, place), name; PavelQUERY PLAN Pavel Pavel GroupAggregate (cost=101.14..101.38 rows=18 width=68) PavelGrouping Sets: (name, place), (name), (name) I think I can safely claim from the spec that our version is correct. Following the syntactic transformations given in 7.9 group by clause of sql2008, we have: GROUP BY rollup(name,place), name; parses as GROUP BY rollup list, ordinary grouping set Syntax rule 13 replaces the rollup list giving: GROUP BY GROUPING SETS ((name,place), (name), ()), name; Syntax rule 16b gives: GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name); Syntax rule 16c takes the cartesian product of the two sets: GROUP BY GROUPING SETS ((name,place,name), (name,name), (name)); Syntax rule 17 gives: SELECT ... GROUP BY name,place,name UNION ALL SELECT ... GROUP BY name,name UNION ALL SELECT ... GROUP BY name Obviously at this point the extra name columns become redundant so we eliminate them (this doesn't correspond to a spec rule, but doesn't change the semantics). So we're left with: SELECT ... GROUP BY name,place UNION ALL SELECT ... GROUP BY name UNION ALL SELECT ... GROUP BY name Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's behavior agrees with my interpretation. Nothing in the spec that I can find licenses the elimination of duplicate grouping sets except indirectly via feature T434 (GROUP BY DISTINCT ...), which we did not attempt to implement. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] What in the world is happening with castoroides and protosciurus?
For the last month or so, these two buildfarm animals (which I believe are the same physical machine) have been erratically failing with errors that reflect low-order differences in floating-point calculations. A recent example is at http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurusdt=2014-08-25%2010%3A39%3A52 where the only regression diff is *** /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/expected/hash_index.out Mon Aug 25 11:41:00 2014 --- /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/results/hash_index.out Mon Aug 25 11:57:26 2014 *** *** 171,179 SELECT h.seqno AS i8096, h.random AS f1234_1234 FROM hash_f8_heap h WHERE h.random = '-1234.1234'::float8; ! i8096 | f1234_1234 ! ---+ ! 8906 | -1234.1234 (1 row) UPDATE hash_f8_heap --- 171,179 SELECT h.seqno AS i8096, h.random AS f1234_1234 FROM hash_f8_heap h WHERE h.random = '-1234.1234'::float8; ! i8096 |f1234_1234 ! ---+--- ! 8906 | -1234.12356777216 (1 row) UPDATE hash_f8_heap ... a result that certainly makes no sense. The results are not repeatable, failing in equally odd ways in different tests on different runs. This is happening in all the back branches too, not just HEAD. Has there been a system software update on this machine a month or so ago? If not, it's hard to think anything except that the floating point hardware on this box has developed problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile
On 2014-08-25 20:16:50 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Currently running make -j16 all check in contrib/ results in a mess because all pg_regress invocations fight over the same port. Adding a simple .NOTPARALLEL: check-%-recurse into contrib/Makefile fixes that. Do we want that? Dunno, but if we do, it should be applied to installcheck as well. (In that case the fight is over the contrib_regression database.) Right. Although you can mostly fight it there using USE_MODULE_DB. The attached patch that replaces all hardcoded occurrences of 'contrib_regression' with current_database(). Allowing a make -j32 -s installcheck in contrib to succeed in less than 4 seconds... That's not particularly pretty, especially in the CREATE SERVER calls (via DO ... EXECUTE), but imo worth it given the timesavings. What's your thought on that one? A larger point is that you shouldn't really fix this only for contrib. What about tests run in src/pl, or contrib vs the main regression tests vs src/pl vs test/isolationtester, etc etc. Unfortunately I don't think you can make .NOTPARALLEL work across more than one directory when using recursive make :(. At least I don't know how. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From e81c42ed9eb640dab874bf5778e5e71b485633ba Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 26 Aug 2014 02:54:53 +0200 Subject: [PATCH] Don't hardcode contrib_regression dbname in postgres_fdw and dblink tests. That allows parallel make installcheck to succeed inside contrib/. The output is not particularly pretty, but it's fast. --- contrib/dblink/Makefile| 3 -- contrib/dblink/expected/dblink.out | 40 ++--- contrib/dblink/sql/dblink.sql | 41 +++--- contrib/postgres_fdw/Makefile | 3 -- contrib/postgres_fdw/expected/postgres_fdw.out | 8 +++-- contrib/postgres_fdw/sql/postgres_fdw.sql | 8 +++-- 6 files changed, 57 insertions(+), 46 deletions(-) diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile index e5d0cd6..f5ab164 100644 --- a/contrib/dblink/Makefile +++ b/contrib/dblink/Makefile @@ -14,9 +14,6 @@ REGRESS = paths dblink REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress EXTRA_CLEAN = sql/paths.sql expected/paths.out -# the db name is hard-coded in the tests -override USE_MODULE_DB = - ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index eee31a2..0dc4838 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -90,7 +90,7 @@ SELECT dblink_build_sql_delete('MySchema.Foo','1 2',2,'{0, a}'); -- regular old dblink SELECT * -FROM dblink('dbname=contrib_regression','SELECT * FROM foo') AS t(a int, b text, c text[]) +FROM dblink('dbname='||current_database(),'SELECT * FROM foo') AS t(a int, b text, c text[]) WHERE t.a 7; a | b | c ---+---+ @@ -116,9 +116,9 @@ DECLARE detail text; BEGIN PERFORM wait_pid(crash_pid) - FROM dblink('dbname=contrib_regression', $$ + FROM dblink('dbname='||current_database(), $$ SELECT pg_backend_pid() FROM dblink( - 'service=test_ldap dbname=contrib_regression', + 'service=test_ldap dbname='||current_database(), -- This string concatenation is a hack to shoehorn a -- set_pgservicefile call into the SQL statement. 'SELECT 1' || set_pgservicefile('pg_service.conf') @@ -131,7 +131,7 @@ EXCEPTION WHEN OTHERS THEN END $pl$; -- create a persistent connection -SELECT dblink_connect('dbname=contrib_regression'); +SELECT dblink_connect('dbname='||current_database()); dblink_connect OK @@ -267,14 +267,14 @@ WHERE t.a 7; ERROR: connection not available -- put more data into our slave table, first using arbitrary connection syntax -- but truncate the actual return value so we can use diff to check for success -SELECT substr(dblink_exec('dbname=contrib_regression','INSERT INTO foo VALUES(10,''k'',''{a10,b10,c10}'')'),1,6); +SELECT substr(dblink_exec('dbname='||current_database(),'INSERT INTO foo VALUES(10,''k'',''{a10,b10,c10}'')'),1,6); substr INSERT (1 row) -- create a persistent connection -SELECT dblink_connect('dbname=contrib_regression'); +SELECT dblink_connect('dbname='||current_database()); dblink_connect OK @@ -374,7 +374,7 @@ ERROR: could not establish connection DETAIL: missing = after myconn in connection info string -- create a named persistent connection -SELECT dblink_connect('myconn','dbname=contrib_regression'); +SELECT dblink_connect('myconn','dbname='||current_database()); dblink_connect OK @@ -403,10 +403,10 @@ CONTEXT: Error occurred on
Re: [HACKERS] improving speed of make check-world
On 8/25/14 1:32 PM, Heikki Linnakangas wrote: The new EXTRA_INSTALL makefile variable ought to be documented in extend.sgml, where we list REGRESS_OPTS and others. But EXTRA_INSTALL is only of use inside the main source tree, not by extensions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql latency bgwriter not doing its job
On Monday, August 25, 2014, Fabien COELHO coe...@cri.ensmp.fr wrote: The culprit I found is bgwriter, which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink. Indeed, at the end of the 500 seconds throttled test, pg_stat_bgwriter says: Are you doing pg_stat_reset_shared('bgwriter') after running pgbench -i? You don't want your steady state stats polluted by the bulk load. buffers_checkpoint = 19046 buffers_clean = 2995 Out of curiosity, what does buffers_backend show? In any event, this almost certainly is a red herring. Whichever of the three ways are being used to write out the buffers, it is the checkpointer that is responsible for fsyncing them, and that is where your drama is almost certainly occurring. Writing out with one path rather than a different isn't going to change things, unless you change the fsync. Also, are you familiar with checkpoint_completion_target, and what is it set to? Cheers, Jeff
[HACKERS] Question about coding of free space map
While looking into backend/storage/freespace/freespace.c, I noticed that struct FSMAddress is passed to functions by value, rather than reference. I thought our code practice is defining pointer to a struct data and using the pointer for parameter passing etc. typedef struct RelationData *Relation; IMO freespace.c is better to follow the practice. Maybe this has been allowed because: typedef struct { int level; /* level */ int logpageno; /* page number within the level */ } FSMAddress; the struct size is 4+4=8 byte, which is same as 64 bit pointer. Still I think it's better to use pointer to the struct because someday we may want to add new member to the struct. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hardening pg_upgrade
On Mon, Aug 25, 2014 at 06:34:38PM -0400, Bruce Momjian wrote: OK, I will move in the direction of removing 8.3 support and use a single query to pull schema information. I was hesistant to remove 8.3 support as I know we have kept pg_dump support all the way back to 7.0, but it seems pg_upgrade need not have the same version requirements. I will modify pg_upgrade in three steps: o remove 8.3 support o use a single CTE rather than use a temp table o harden the backend to prevent automatic oid assignment This is all for 9.5. Done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Thu, Apr 17, 2014 at 11:23:24AM +0200, Andres Freund wrote: On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote: On Thu, Feb 6, 2014 at 09:40:32AM +0100, Andres Freund wrote: On 2014-02-05 12:36:42 -0500, Robert Haas wrote: It may well be that your proposal is spot on. But I'd like to see some data-structure-by-data-structure measurements, rather than assuming that alignment must be a good thing. I am fine with just aligning BufferDescriptors properly. That has clearly shown massive improvements. I thought your previous idea of increasing BUFFERALIGN to 64 bytes had a lot to recommend it. Good. I wonder if we shouldn't move that bit of logic: if (size = BUFSIZ) newStart = BUFFERALIGN(newStart); out of ShmemAlloc() and instead have a ShmemAllocAligned() and ShmemInitStructAligned() that does it. So we can sensibly can control it per struct. But that doesn't mean it doesn't need testing. I feel the need here, to say that I never said it doesn't need testing and never thought it didn't... Where are we on this? It needs somebody with time to evaluate possible performance regressions - I personally won't have time to look into this in detail before pgcon. Again, has anyone made any headway on this? Is it a TODO item? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
(2014/08/25 21:58), Albe Laurenz wrote: Here is my review: Thank you for the review! I played with it, and apart from Hanada's comments I have found the following: test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id 3; QUERY PLAN -- Update on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1) - Foreign Scan on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=27 loops=1) Output: id, val, ctid Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id 3)) Planning time: 0.179 ms Execution time: 3706.919 ms (6 rows) Time: 3708.272 ms The actual time readings are surprising. Shouldn't these similar to the actual execution time, since most of the time is spent in the foreign scan node? I was also thinkng that this is confusing to the users. I think this is because the patch executes the UPDATE/DELETE statement during postgresBeginForeignScan, not postgresIterateForeignScan, as you mentioned below: Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed during postgresBeginForeignScan rather than during postgresIterateForeignScan. It probably does not matter, but is there a reason to do it different from the normal scan? I'll modify the patch so as to execute the statement during postgresIterateForeignScan. It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE is pushed down, right? Maybe it would make sense to add an assertion for that. IIUC, that is right. As ModifyTable doesn't support rescan currently, postgresReScanForeignScan needn't to be called in the update pushdown case. The assertion is a good idea. I'll add it. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Tue, Aug 26, 2014 at 5:46 AM, Andres Freund and...@anarazel.de wrote: On August 25, 2014 10:35:20 PM CEST, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: this might be difficult to call this as --concurrently. It might need to be change the name. I'm OK to say that as --concurrently if the document clearly explains that restriction. Or --almost-concurrently? ;P By reading that I am thinking as well about a wording with lock, like --minimum-locks. Why not just finish up the REINDEX CONCURRENTLY patch. +1 +many. Although I'm not sure if we managed to find a safe relation swap. That safe relation swap is possible if an AccessExclusive lock is taken. Right? That means that REINDEX CONCURRENTLY is not completely-concurrently, but I think that many users are satisfied with even this feature. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrently option for reindexdb
On Tue, Aug 26, 2014 at 12:28 PM, Fujii Masao masao.fu...@gmail.com wrote: +many. Although I'm not sure if we managed to find a safe relation swap. Well we didn't AFAIK. With the latest patch provided I could not really find any whole in the logic, and Andres felt that something may be wrong miles away. If I'd revisit the patch now with a rebased version maybe I may find smth... That safe relation swap is possible if an AccessExclusive lock is taken. Right? That means that REINDEX CONCURRENTLY is not completely-concurrently, but I think that many users are satisfied with even this feature. This would block as well isolation tests on this feature, something not that welcome for a feature calling itself concurrently, but it would deadly simplify the patch and reduce deadlock occurrences if done right with the exclusive locks (no need to check for past snapshots necessary when using ShareUpdateExclusiveLock?). I left notes on the wiki the status of this patch: https://wiki.postgresql.org/wiki/Reindex_concurrently Reading this thread, the consensus would be to use an exclusive lock for swap and be done. Well if there are enough votes for this approach I wouldn't mind resending an updated patch for the next CF. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-11-15 10:43:23 -0500, Tom Lane wrote: Another reason I'm not in a hurry is that the problem we're trying to solve doesn't seem to be causing real-world trouble. So by awhile, I'm thinking let's let it get through 9.4 beta testing. Well, there have been a bunch of customer complaints about it, afair that's what made Alvaro look into it in the first place. So it's not a victimless bug. OK, then maybe end-of-beta is too long. But how much testing will it get during development? I know I never use SSL on development installs. How many hackers do? Just a reminder that I intend to backpatch this (and subsequent fixes). We've gone over two 9.4 betas now. Maybe it'd be a good thing if the beta3 announcement carried a note about enabling SSL with a low ssl_renegotiation_limit setting. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql latency bgwriter not doing its job
On Tue, Aug 26, 2014 at 1:53 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD attached, pgbench simple update script run for a some time (scale 100, fillfactor 95) pgbench -M prepared -N -c 2 -T 500 -P 1 ... gives 300 tps. However this performance is really +1000 tps for a few seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above max): pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ... The same thing more or less happens in a delayed fashion... You get 50 tps for some time, followed by sections of 15 seconds at 0 tps for the checkpoint when the segments are full... the server is unresponsive about 10% of the time (one in ten transaction is late by more than 200 ms). I think another thing to know here is why exactly checkpoint storm is causing tps to drop so steeply. One reason could be that backends might need to write more WAL due Full_Page_Writes, another could be contention around buffer content_lock. To dig more about the reason, the same tests can be tried by making Full_Page_Writes = off and/or synchronous_commit = off to see if WAL writes are causing tps to go down. Similarly for checkpoints, use checkpoint_completion_target to spread the checkpoint_writes as suggested by Jeff as well to see if that can mitigate the problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Final Patch for GROUPING SETS
2014-08-26 2:45 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk: Pavel == Pavel Stehule pavel.steh...@gmail.com writes: Pavel Hi Pavel I checked this patch, and it working very well Pavel I found only two issue - I am not sure if it is issue Pavel It duplicate rows Pavel postgres=# explain select name, place, sum(count), grouping(name), Pavel grouping(place) from cars group by rollup(name, place), name; PavelQUERY PLAN Pavel Pavel GroupAggregate (cost=101.14..101.38 rows=18 width=68) PavelGrouping Sets: (name, place), (name), (name) I think I can safely claim from the spec that our version is correct. Following the syntactic transformations given in 7.9 group by clause of sql2008, we have: GROUP BY rollup(name,place), name; parses as GROUP BY rollup list, ordinary grouping set Syntax rule 13 replaces the rollup list giving: GROUP BY GROUPING SETS ((name,place), (name), ()), name; Syntax rule 16b gives: GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name); Syntax rule 16c takes the cartesian product of the two sets: GROUP BY GROUPING SETS ((name,place,name), (name,name), (name)); Syntax rule 17 gives: SELECT ... GROUP BY name,place,name UNION ALL SELECT ... GROUP BY name,name UNION ALL SELECT ... GROUP BY name Obviously at this point the extra name columns become redundant so we eliminate them (this doesn't correspond to a spec rule, but doesn't change the semantics). So we're left with: SELECT ... GROUP BY name,place UNION ALL SELECT ... GROUP BY name UNION ALL SELECT ... GROUP BY name Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's behavior agrees with my interpretation. Nothing in the spec that I can find licenses the elimination of duplicate grouping sets except indirectly via feature T434 (GROUP BY DISTINCT ...), which we did not attempt to implement. ok, I'll try to search in my memory to find some indices, so redundant columns should be reduced, Regards Pavel -- Andrew (irc:RhodiumToad)
Re: [HACKERS] SSL renegotiation
On Mon, Aug 25, 2014 at 11:46:13PM -0400, Alvaro Herrera wrote: Tom Lane wrote: OK, then maybe end-of-beta is too long. But how much testing will it get during development? I know I never use SSL on development installs. How many hackers do? Just a reminder that I intend to backpatch this (and subsequent fixes). We've gone over two 9.4 betas now. Maybe it'd be a good thing if the beta3 announcement carried a note about enabling SSL with a low ssl_renegotiation_limit setting. To elaborate on my private comments of 2013-10-11, I share Robert's wariness[1] concerning the magic number of 1024 bytes of renegotiation headroom. Use of that number predates your work, but your work turned exhaustion of that headroom into a FATAL error. Situations where the figure is too small will become disruptive, whereas the problem is nearly invisible today. Network congestion is a factor, so the lack of complaints during beta is relatively uninformative. Disabling renegotiation is a quick workaround, fortunately, but needing to use that workaround will damage users' fragile faith in the safety of our minor releases. My recommendation is to either keep this 9.4-only or do focused load testing to determine the actual worst-case headroom requirement. [1] http://www.postgresql.org/message-id/ca+tgmozvgmyzlx7e4arq_5nu4uden7wrvg1xjxg_o9c61ch...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On Fri, Aug 22, 2014 at 5:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fabrízio de Royes Mello wrote: On Fri, Aug 22, 2014 at 4:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I pointed out, in the email just before pushing the patch, that perhaps we should pass down the new relpersistence flag into finish_heap_swap, and from there we could pass it down to reindex_index() which is where it would be needed. I'm not sure it's worth the trouble, but I think we can still ask Fabrizio to rework that part. I can rework this part if it's a real concern. I guess we can make a better assessment by seeing what it would take. I'm afraid it will turn out to be really ugly. Now, there's some desire to have unlogged indexes on logged tables; I guess if we have that, then eventually there will also be a desire to swap individual indexes from logged to unlogged. Perhaps whatever fix we come up with here would pave the road for that future feature. Álvaro, I did a refactoring to pass down the relpersistence to finish_heap_swap and then change the catalog inside the reindex_index instead of in the rewrite table phase. That way we can get rid the function ATChangeIndexesPersistence in the src/backend/commands/tablecmds.c. Thoughts? -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index ee10594..173f412 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -1973,7 +1973,7 @@ index_build(Relation heapRelation, * created it, or truncated twice in a subsequent transaction, the * relfilenode won't change, and nothing needs to be done here. */ - if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED + if (indexRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM)) { RegProcedure ambuildempty = indexRelation-rd_am-ambuildempty; @@ -3099,7 +3099,7 @@ IndexGetRelation(Oid indexId, bool missing_ok) * reindex_index - This routine is used to recreate a single index */ void -reindex_index(Oid indexId, bool skip_constraint_checks) +reindex_index(Oid indexId, bool skip_constraint_checks, char relpersistence) { Relation iRel, heapRelation; @@ -3160,6 +3160,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks) indexInfo-ii_ExclusionStrats = NULL; } + /* + * Check if need to set the new relpersistence + */ + if (iRel-rd_rel-relpersistence != relpersistence) + iRel-rd_rel-relpersistence = relpersistence; + /* We'll build a new physical relation for the index */ RelationSetNewRelfilenode(iRel, InvalidTransactionId, InvalidMultiXactId); @@ -3358,11 +3364,19 @@ reindex_relation(Oid relid, int flags) foreach(indexId, indexIds) { Oid indexOid = lfirst_oid(indexId); + char relpersistence = rel-rd_rel-relpersistence; if (is_pg_class) RelationSetIndexList(rel, doneIndexes, InvalidOid); - reindex_index(indexOid, !(flags REINDEX_REL_CHECK_CONSTRAINTS)); + /* Check for flags to force UNLOGGED or PERMANENT persistence */ + if (flags REINDEX_REL_FORCE_UNLOGGED) +relpersistence = RELPERSISTENCE_UNLOGGED; + else if (flags REINDEX_REL_FORCE_PERMANENT) +relpersistence = RELPERSISTENCE_PERMANENT; + + reindex_index(indexOid, !(flags REINDEX_REL_CHECK_CONSTRAINTS), + relpersistence); CommandCounterIncrement(); diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index ff80b09..f285026 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -588,7 +588,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose) */ finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog, swap_toast_by_content, false, true, - frozenXid, cutoffMulti); + frozenXid, cutoffMulti, + OldHeap-rd_rel-relpersistence); } @@ -1474,7 +1475,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, bool check_constraints, bool is_internal, TransactionId frozenXid, - MultiXactId cutoffMulti) + MultiXactId cutoffMulti, + char newrelpersistence) { ObjectAddress object; Oid mapped_tables[4]; @@ -1518,6 +1520,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE; if (check_constraints) reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS; + + if (newrelpersistence == RELPERSISTENCE_UNLOGGED) + reindex_flags |= REINDEX_REL_FORCE_UNLOGGED; + else if (newrelpersistence == RELPERSISTENCE_PERMANENT) + reindex_flags |= REINDEX_REL_FORCE_PERMANENT; + reindex_relation(OIDOldHeap, reindex_flags); /* diff --git a/src/backend/commands/indexcmds.c