Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On 09/07/2014 05:58 AM, Peter Geoghegan wrote: + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. Postgres shares in-memory buffers + among backends. As a result, we do page-level read locking on btree + pages in order to guarantee that no record is modified while we are + examining it. This reduces concurrency but guarantees correct + behavior. An advantage is that when trading in a read lock for a + write lock, we need not re-read the page after getting the write lock. + Since we're also holding a pin on the shared buffer containing the + page, we know that buffer still contains the page and is up-to-date. This is the existing paragraph, just moved to different place in the README. + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). This is just duplicating the existing paragraph. I don't see the point of this. Even with these read locks, following LY obviates the need + of earlier schemes to hold multiple locks concurrently when descending + the tree as part of servicing index scans (pessimistic lock coupling). + The addition of right-links at all levels, as well as the addition of + a page high key allows detection and dynamic recovery from + concurrent page splits (that is, splits between unlocking an internal + page, and subsequently locking its child page during a descent). When + a page is first locked (at every level of a descent servicing an index + scan), we consider the need to move right: if the scankey value is + less than (or sometimes less than or equal to) the page's existing + highkey value, a value which serves as an upper bound for values on + the page generally, then it must be necessary to move the scan to the + right-hand page on the same level. It's even possible that the scan + needs to move right more than once. Once the other session's + concurrent page split finishes, a downlink will be inserted into the + parent, and so assuming there are no further page splits, future index + scans using the same scankey value will not need to move right. LY + Trees are sometimes referred to as B-Link trees in the literature. This explains in a few sentences what a LY B-tree looks like. The current README assumes that you're already familiar with what a LY tree looks like, or that you go read the paper mentioned at the top of the README. It might be a good idea to expand on that, and add an introduction like this so that an unfamiliar reader doesn't need to read the LY paper first. Is that the purpose of this patch? Please make it more explicit. And please make the sentences simpler - the above reads like a Shakespeare play. Something like: The basic Lehman Yao Algorithm Compared to a classic B-tree, LY adds a right-link pointer to each page, to the page's right sibling. It also adds a high key to each page, which is an upper bound on the keys that are allowed on that page. These two additions make it possible detect a concurrent page split, which allows the tree to be searched without holding any read locks (except to keep a single page from being modified while reading it). When a search follows a downlink to a child page, it compares the page's high key with the search key. If the search key is greater than the high key, the page must've been split concurrently, and you must follow the right-link to find the new page containing the key range you're looking for. This might need to be repeated, if the page has been split more than once. Differences to the Lehman Yao algorithm = (current Lehamn and Yao Algorithm and Insertions section) I think that's pretty much the same information you added, but it's in a separate section, with the clear purpose that it explains what a LY tree looks like. You can skip over it, if you have read the paper or are otherwise already familiar with it. It still assumes that you're familiar with B-trees in general. Anyway, I see that you had resurrected this in the commitfest app after three weeks of inactivity. I'm going to mark this back to Returned with Feedback. Please don't resurrect it again, this patch has received more than its fair share of attention. Instead, please help by signing up to review a patch. The commitfest progress is glacial at the moment, and we really need experienced reviewers like you to get closure to people's patches. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 2014-09-09 07:54, Craig Ringer wrote: On 09/05/2014 05:21 PM, Pavel Stehule wrote: *shrug* Doing it in SQL would probably break more stuff. I'm trying to contain the damage. And arguably, this is mostly only useful in PL/PgSQL. I've wanted assertions in SQL enough that I often write trivial wrappers around `raise` in PL/PgSQL for use in `CASE` statements etc. Yeah, as have I. I've also advocated that there should be a raise_exception(text, anyelement) anyelement function shipped with postgres. But this is something different; this is just a single statement which asserts that some expression evaluates to true. Even if we allowed it to be used as a scalar expression, there's still the problem anyelement is commonly used to work around. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 08/15/2014 11:55 AM, Marko Tiikkaja wrote: Hi, On 8/8/14 3:18 PM, I wrote: Currently there's no way to generate or extract armor headers from the PGP armored format in pgcrypto. I've written a patch to add the support. Latest version of the patch here, having fixed some small coding issues. This coding style gives me the willies: guess_len = pgp_armor_enc_len(data_len, num_headers, keys, values); res = palloc(VARHDRSZ + guess_len); res_len = pgp_armor_encode((uint8 *) VARDATA(data), data_len, (uint8 *) VARDATA(res), num_headers, keys, values); if (res_len guess_len) ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_INVOCATION_EXCEPTION), errmsg(Overflow - encode estimate too small))); That was OK before this patch, as the length calculation was simple enough to verify (although if I were writing it from scratch, I would've written it differently). But with this patch, it gets a lot more complicated, and I can't easily convince myself that it's correct. pgp_armor_enc_len might be vulnerable to integer overflow. Consider 1GB worth of keys, 1GB worth of values, and 1GB worth of data. I'm not sure if you can quite make it overflow a 32-bit unsigned integer, but at least you can get nervously close, e.g if you use use max-sized key/value arrays, with a single byte in each key and value. Oh, and if you use a single-byte server encoding and characters that get expanded to multiple bytes in UTF-8, you can go higher. So I think this (and the corresponding dearmor code too) should be refactored to use a StringInfo that gets enlarged as needed, instead of hoping to guess the size correctly beforehand. To ease review, might make sense to do that as a separate patch over current sources, and the main patch on top of that. BTW, I'm surprised that there is no function to get all the armor headers. You can only probe for a particular one with pgp_armor_headder, but there is no way to list them all, if you don't know what you're looking for. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 9/9/14 10:54 AM, Heikki Linnakangas wrote: So I think this (and the corresponding dearmor code too) should be refactored to use a StringInfo that gets enlarged as needed, instead of hoping to guess the size correctly beforehand. To ease review, might make sense to do that as a separate patch over current sources, and the main patch on top of that. Yeah, I thought the same thing while writing that code. Thanks, I'll do it that way. BTW, I'm surprised that there is no function to get all the armor headers. You can only probe for a particular one with pgp_armor_headder, but there is no way to list them all, if you don't know what you're looking for. Right. The assumption was that due to the nature of how the headers are used, the user would always be looking for a certain, predetermined set of values. But I don't oppose to adding a pgp_armor_header_keys() function to get the set of all keys, for example. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-09-04 08:18:37 -0400, Robert Haas wrote: On Tue, Aug 5, 2014 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 6, 2014 at 3:12 PM, Andres Freund and...@2ndquadrant.com wrote: If you want to do that, it's fine with me. What I would do is: - Back-patch the addition of the sparcv8+ stuff all the way. If anyone's running anything older, let them complain... - Remove the special case for MIPS without gcc intrinsics only in master, leaving the back-branches broken. If anyone cares, let them complain... - Nothing else. I've gone ahead and done the second of these things. Thanks. Andres, do you want to go take a stab at fixing the SPARC stuff? Will do, will probably take me till thursday to come up with the brain cycles. Ping? This has been pending for almost two months now and, at your request, my patch to make spinlocks act as compiler barriers is waiting behind it. Can we please get this moving again soon, or can I commit that patch and you can fix this when you get around to it? I finally pushed this. And once more I seriously got pissed at the poor overall worldwide state of documentation and continously changing terminology around this. Sorry for taking this long :( Do you have a current version of your patch to make them compiler barriers? Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On 09/03/2014 06:39 PM, Robert Haas wrote: On Wed, Sep 3, 2014 at 11:02 AM, Marko Tiikkaja ma...@joh.to wrote: On 9/3/14 4:46 PM, Robert Haas wrote: Making it suck more because you don't think it's as important as your feature is, in my opinion, not cool. I really can't see how that would make inheritance suck *more*. You can't do UPDATE .. ORDER BY now, and you wouldn't be able to do it after the patch. Yeah, sure, perhaps people using inheritance might feel left out, but surely that would just motivate them to work on improving it. I think it's entirely reasonable for us to require that all new SQL features should be required to work with or without inheritance. If we took the opposition position, and said that the only things that need to work with inheritance are the ones that existed at the time inheritance was introduced, then we'd not need to worry about it not only for this feature but for row-level security and SKIP LOCKED and GROUPING SETS and, going back a bit further, materialized views and security-barrier views and LATERAL and CTEs and on and on. Perhaps not all of those require any special handling for inheritance hierarchies, but some of them surely did, and if even 10% of the SQL features that we have added since table inheritance were allowed to opt out of supporting it, we'd have a broken and unusable feature today. Now some people might argue that we have that anyway, but the fact is that a lot of people are using inheritance today, even with all its flaws, and they wouldn't be if there were a long laundry list of limitations that didn't apply to regular tables. We should be looking to lift the limitations that currently exist, not add more. I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should work with inheritance. So the path forward is (using Marko's phrasing upthread): 1) We put the LIMIT inside ModifyTable like this patch does. This doesn't prevent us from doing ORDER BY in the future, but helps numerous people who today have to 2) Someone rewrites how UPDATE works based on Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us, which allows us to support ORDER BY on all tables (or perhaps maybe not FDWs, I don't know how those work). The LIMIT functionality in this patch is unaffected. What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are yes and not much, then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first. Etsuro, Kaigei, please take a look at the patch and try to make a guess on how much of this still needs to be rewritten if we do 2). If not much, then please continue to review it, with the aim of getting it into a committable state. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Tue, Sep 9, 2014 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Sep 8, 2014 at 2:05 AM, Amit Kapila amit.kapil...@gmail.com wrote: Another point about error handling is that to execute the sql in function pg_background_worker_main(), it starts the transaction which I think doesn't get aborted if error occurs For this I was just referring error handling code of StartBackgroundWorker(), however during shutdown of process, it will call AbortOutOfAnyTransaction() which will take care of aborting transaction. Yeah. and similarly handling for timeout seems to be missing in error path. As we are anyway going to exit on error, so not sure, if this will be required, however having it for clean exit seems to be better. Can you be more specific? I was thinking to handle errors similar to what PostgreMain does, however if it is going to exit then it might no be worth. I'm generally of the view that there's little point in spending time cleaning things up that will go away anyway when the process exits. Yeah, in that case it might not make much sense. The argument here could be why it has to exit, why can't it wait till the launcher asks it to exit. You have mentioned in previous mail that you want to stick to the approach taken by patch, however it is not clear why you think that is better. If I try to think about how the worker backend should behave incase the master backend assigns some task to worker, then I think it would be sensible for worker to not exit after completing it's task (either it has finished the execution of work assigned or otherwise) as master backend can assign further tasks to the same worker. Another advantage would be that setup and tear down cost of worker will be saved. Now If we just think w.r.t pg_background it might not be so important to let worker wait till launcher asks to quit, however as you have mentioned in your mail upthread that you are expecting this kind of set-up for actual parallelism, so it might not be completely insensible to expect worker to wait till it has been asked to quit. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 9, 2014 at 3:11 AM, Thom Brown t...@linux.com wrote: On 5 September 2014 14:19, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Sep 5, 2014 at 5:17 PM, Amit Kapila amit.kapil...@gmail.com wrote: Apart from above, I think for this patch, cat version bump is required as I have modified system catalog. However I have not done the same in patch as otherwise it will be bit difficult to take performance data. One regression failed on linux due to spacing issue which is fixed in attached patch. Here's a set of test results against this patch: Many thanks for taking the performance data. This data clearly shows that there is a performance improvement at even lower configuration, however the real benefit of the patch can be seen with higher core m/c and with larger RAM (can contain all the data). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Escaping from blocked send() reprised.
Hi, I added and edited some comments. Sorry, It tha patch contains a silly bug. Please find the attatched one. regards, -- Kyotaro Horiguchi NTT Open Source Software Center From eb91a7c91e1fd3b24bf5bff0eb885f1c3d274637 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp Date: Fri, 5 Sep 2014 17:21:48 +0900 Subject: [PATCH] Simplly cutting off the socket if signalled during sending to client. --- src/backend/libpq/be-secure.c | 21 ++-- src/backend/libpq/pqcomm.c| 13 +++ src/backend/tcop/postgres.c | 71 ++--- src/include/libpq/libpq.h |1 + 4 files changed, 70 insertions(+), 36 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 41ec1ad..3006697 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -145,11 +145,11 @@ secure_raw_read(Port *port, void *ptr, size_t len) { ssize_t n; - prepare_for_client_read(); + prepare_for_client_comm(); n = recv(port-sock, ptr, len, 0); - client_read_ended(); + client_comm_ended(); return n; } @@ -178,5 +178,20 @@ secure_write(Port *port, void *ptr, size_t len) ssize_t secure_raw_write(Port *port, const void *ptr, size_t len) { - return send(port-sock, ptr, len, 0); + ssize_t n; + + /* + * If we get interrupted during send under execution without blocking, + * processing interrupt immediately actually throws away the chance to + * complete sending the bytes handed, but the chance which we could send + * one more tuple or maybe the final bytes has less not significance than + * the risk that we might can't bail out forever due to blocking send. + */ + prepare_for_client_comm(); + + n = send(port-sock, ptr, len, 0); + + client_comm_ended(); + + return n; } diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 605d891..9b08529 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -1343,6 +1343,19 @@ pq_is_send_pending(void) } /* + * pq_is_busy - is there any I/O command running? + * + * This function is intended for use within signal handlers to check if + * any pqcomm I/O operation is under execution. + * + */ +bool +pq_is_busy(void) +{ + return PqCommBusy; +} + +/* * Message-level I/O routines begin here. * * These routines understand about the old-style COPY OUT protocol. diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 7b5480f..b29b200 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -303,16 +303,16 @@ InteractiveBackend(StringInfo inBuf) * * Even though we are not reading from a client process, we still want to * respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use - * prepare_for_client_read and client_read_ended. + * prepare_for_client_comm and client_comm_ended. */ static int interactive_getc(void) { int c; - prepare_for_client_read(); + prepare_for_client_comm(); c = getc(stdin); - client_read_ended(); + client_comm_ended(); return c; } @@ -487,53 +487,47 @@ ReadCommand(StringInfo inBuf) } /* - * prepare_for_client_read -- set up to possibly block on client input + * prepare_for_client_comm -- set up to possibly block on client communication * - * This must be called immediately before any low-level read from the - * client connection. It is necessary to do it at a sufficiently low level - * that there won't be any other operations except the read kernel call - * itself between this call and the subsequent client_read_ended() call. + * This must be called immediately before any low-level read from or write to + * the client connection. It is necessary to do it at a sufficiently low + * level that there won't be any other operations except the read/write kernel + * call itself between this call and the subsequent client_comm_ended() call. * In particular there mustn't be use of malloc() or other potentially - * non-reentrant libc functions. This restriction makes it safe for us - * to allow interrupt service routines to execute nontrivial code while - * we are waiting for input. + * non-reentrant libc functions. This restriction makes it safe for us to + * allow interrupt service routines to execute nontrivial code while we are + * waiting for input or blocking of output. */ void -prepare_for_client_read(void) +prepare_for_client_comm(void) { - if (DoingCommandRead) - { - /* Enable immediate processing of asynchronous signals */ - EnableNotifyInterrupt(); - EnableCatchupInterrupt(); + /* Enable immediate processing of asynchronous signals */ + EnableNotifyInterrupt(); + EnableCatchupInterrupt(); - /* Allow cancel/die interrupts to be processed while waiting */ - ImmediateInterruptOK = true; + /* Allow cancel/die interrupts to be processed while waiting */ + ImmediateInterruptOK = true; - /*
Re: [HACKERS] LIMIT for UPDATE and DELETE
On 9/9/14 11:57 AM, Heikki Linnakangas wrote: What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough? I'd say so; I could use it right now. I have to remove millions of lines from a table, but I don't want the live transaction processing to take a hit, so I have to do it in batches. Granted, some kind of rate limiting would achieve the same thing for DELETE, but with UPDATE you also have to consider row locking, and rate limiting wouldn't help with that at all; it would, in fact, just make it worse. I'll also be running a big UPDATE like that later today, so that's two uses today alone for me. And no, these are not routine things so keep your use partitions suggestions to yourselves. And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are yes and not much, then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first. I'd say not much, if anything at all. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On 9/9/14 12:37 PM, I wrote: And no, these are not routine things so keep your use partitions suggestions to yourselves. My apologies. This was not directed at you personally, Heikki, and in any case it was unnecessarily hostile. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
On 09/05/2014 06:38 PM, Jan Wieck wrote: On 09/05/2014 10:12 AM, Fabien COELHO wrote: Note that despite pg appaling latency performance, in may stay well over the 90% limit, or even 100%: when things are going well a lot of transaction run in about ms, while when things are going bad transactions would take a long time (although possibly under or about 1s anyway), *but* very few transactions are passed, the throughput is very small. The fact that during 15 seconds only 30 transactions are processed is a detail that does not show up in the metric. Yeah, it makes much more sense to measure the latency from the scheduled time than the actual time. I haven't used the real pgbench for a long time. I will have to look at your patch and see what the current version actually does or does not. What I have been using is a Python version of pgbench that I wrote for myself when I started learning that language. That one does record both values, the DB transaction latency and the client response time (time from the request being entered into the Queue until transaction commit). When I look at those results it is possible to have an utterly failing run, with 60% of client response times being within 2 seconds, but all the DB transactions are still in milliseconds. I think we have to reconsider what we're reporting in 9.4, when --rate is enabled, even though it's already very late in the release cycle. It's a bad idea to change the definition of latency between 9.4 and 9.5, so let's get it right in 9.4. As said, I'll have to take a look at it. Since I am on vacation next week, getting ready for my first day at EnterpriseDB, this may actually happen. Oh, congrats! :-) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On 09/09/2014 01:46 PM, Marko Tiikkaja wrote: On 9/9/14 12:37 PM, I wrote: And no, these are not routine things so keep your use partitions suggestions to yourselves. My apologies. This was not directed at you personally, Heikki, and in any case it was unnecessarily hostile. No worries, it made me smile :-) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
On 09/09/2014 01:49 PM, Heikki Linnakangas wrote: I think we have to reconsider what we're reporting in 9.4, when --rate is enabled, even though it's already very late in the release cycle. It's a bad idea to change the definition of latency between 9.4 and 9.5, so let's get it right in 9.4. As per the attached patch. I think we should commit this to 9.4. Any objections? The text this patch adds to the documentation needs some rewording, though. As does this existing paragraph: High rate limit schedule lag values, that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle problem there even if the target rate limit is met in the end. One possible cause of schedule lag is insufficient pgbench threads to handle all of the clients. To improve that, consider reducing the number of clients, increasing the number of threads in pgbench, or running pgbench on a separate host. Another possibility is that the database is not keeping up with the load at some point. When that happens, you will have to reduce the expected transaction rate to lower schedule lag. It took me ages to parse high rate limit schedule lag values. - Heikki diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 2f7d80e..bf9c4ea 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -1087,8 +1087,22 @@ top: int64 latency; INSTR_TIME_SET_CURRENT(diff); - INSTR_TIME_SUBTRACT(diff, st-txn_begin); - latency = INSTR_TIME_GET_MICROSEC(diff); + + if (throttle_delay) + { +/* + * Under throttling, compute latency from the scheduled start + * time, not the actual transaction start. + */ +int64 now = INSTR_TIME_GET_MICROSEC(diff); +latency = now - thread-throttle_trigger; + } + else + { +INSTR_TIME_SUBTRACT(diff, st-txn_begin); +latency = INSTR_TIME_GET_MICROSEC(diff); + } + st-txn_latencies += latency; /* diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index b479105..75fd8ec 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -403,8 +403,10 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ Show progress report every literalsec/ seconds. The report includes the time since the beginning of the run, the tps since the last report, and the transaction latency average and standard -deviation since the last report. Under throttling (option-R/), it -also includes the average schedule lag time since the last report. +deviation since the last report. Under throttling (option-R/), +the latency is computed with respect to the transaction scheduled +start time, not the actual transaction beginning time, thus it also +includes the average schedule lag time. /para /listitem /varlistentry @@ -452,6 +454,17 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ output. /para para +Also, when throttling is active, the transaction latency reported +at the end of the run is computed from the scheduled start time, not +the actual transaction start time, so it includes the lag time. +The perspective is from the client point of view who wanted to perform +a transaction but could not even start it for some time. Very high +latencies may be reported when the transactions are lagging behind +schedule. The transaction latency with respect to the actual +transaction start time can be computed by substracting the transaction +lag time from the reported latency. + /para + para High rate limit schedule lag values, that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: BRIN indexes (was Re: [HACKERS] Minmax indexes)
On 09/08/2014 07:02 PM, Alvaro Herrera wrote: Here's version 18. I have renamed it: These are now BRIN indexes. I have fixed numerous race conditions and deadlocks. In particular I fixed this problem you noted: Heikki Linnakangas wrote: Another race condition: If a new tuple is inserted to the range while summarization runs, it's possible that the new tuple isn't included in the tuple that the summarization calculated, nor does the insertion itself udpate it. I did it mostly in the way you outlined, i.e. by way of a placeholder tuple that gets updated by concurrent inserters and then the tuple resulting from the scan is unioned with the values in the updated placeholder tuple. This required the introduction of one extra support proc for opclasses (pretty simple stuff anyhow). Hmm. So the union support proc is only called if there is a race condition? That makes it very difficult to test, I'm afraid. It would make more sense to pass BrinValues to the support functions, rather than DeformedBrTuple. An opclass'es support function should never need to access the values for other columns. Does minmaxUnion handle NULLs correctly? minmaxUnion pfrees the old values. Is that necessary? What memory context does the function run in? If the code runs in a short-lived memory context, you might as well just let them leak. If it runs in a long-lived context, well, perhaps it shouldn't. It's nicer to write functions that can leak freely. IIRC, GiST and GIN runs the support functions in a temporary context. In any case, it might be worth noting explicitly in the docs which functions may leak and which may not. If you add a new datatype, and define b-tree operators for it, what is required to create a minmax opclass for it? Would it be possible to generalize the functions in brin_minmax.c so that they can be reused for any datatype (with b-tree operators) without writing any new C code? I think we're almost there; the only thing that differs between each data type is the opcinfo function. Let's pass the type OID as argument to the opcinfo function. You could then have just a single minmax_opcinfo function, instead of the macro to generate a separate function for each built-in datatype. In general, this patch is in pretty good shape now, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TODO] Process pg_hba.conf keywords as case-insensitive
Hello, I will be the reviewer of this patch. You approach that coloring tokens seems right, but you have broken the parse logic by adding your code. Other than the mistakes others pointed, I found that - non-SQL-ident like tokens are ignored by their token style, quoted or not, so the following line works. | local All aLL trust I suppose this is not what you intended. This is because you have igonred the attribute of a token when comparing it as non-SQL-ident tokens. - '+' at the head of the sequence '+' is treated as the first character of the *quoted* string. e.g. +hoge is tokenized as +hoge:special_quoted. This is why you simply continued processing for '+' without discarding and skipping the '+', and not setting in_quote so the following parser code works as it is not intended. You should understand what the original code does and insert or modify logics not braeking the assumptions. With this patch, database (and role?) names are compared case-insensitively. For example: local MixedDB all trust local mixedDB all reject psql -d mixedDB psql (9.5devel) Type help for help. mixedDB=# That connection should've matched that 2nd line, and be rejected. Actually it should have matched neither, as both lines will get folded downcase: local mixeddb all trust local mixeddb all reject regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
Hi Well, I'd like to hear someone from the field complaining that pg_receivexlog is thrashing the cache and thus reducing the performance of some other process. Or a least a synthetic test case that demonstrates that happening. It's not with pg_receivexlog but it's related. On a small box without replication server connected perfs were good enough but not so with a replication server connected, there was 1GB worth of WAL sitting in RAM vs next to nothing without slave! setup: 8GB RAM 2GB shared_buffers (smaller has other issues) checkpoint_segments 40 (smaller value trigger too much xlog checkpoint) checkpoints spread over 10 mn and write 30 to 50% of shared buffers. live data set fit in RAM. constant load. On startup (1 or 2/hour) applications were running requests on cold data which were now saturating IO. I'm not sure it's an OS bug as the WAL were 'hotter' than the cold data. A cron task every minute with vmtouch -e for evicting old WAL files from memory has solved the issue. Regards -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TODO] Process pg_hba.conf keywords as case-insensitive
Sorry for wrong suggestion. Setting in_quote is wrong there because it's before the beginning quote. Although, advancing read pointer and replacing c with the next value is still needed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center 2014/09/09 20:49 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp: Hello, I will be the reviewer of this patch. You approach that coloring tokens seems right, but you have broken the parse logic by adding your code. Other than the mistakes others pointed, I found that - non-SQL-ident like tokens are ignored by their token style, quoted or not, so the following line works. | local All aLL trust I suppose this is not what you intended. This is because you have igonred the attribute of a token when comparing it as non-SQL-ident tokens. - '+' at the head of the sequence '+' is treated as the first character of the *quoted* string. e.g. +hoge is tokenized as +hoge:special_quoted. This is why you simply continued processing for '+' without discarding and skipping the '+', and not setting in_quote so the following parser code works as it is not intended. You should understand what the original code does and insert or modify logics not braeking the assumptions. With this patch, database (and role?) names are compared case-insensitively. For example: local MixedDB all trust local mixedDB all reject psql -d mixedDB psql (9.5devel) Type help for help. mixedDB=# That connection should've matched that 2nd line, and be rejected. Actually it should have matched neither, as both lines will get folded downcase: local mixeddb all trust local mixeddb all reject regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
Hello Heikki, I think we have to reconsider what we're reporting in 9.4, when --rate is enabled, even though it's already very late in the release cycle. It's a bad idea to change the definition of latency between 9.4 and 9.5, so let's get it right in 9.4. Indeed. As per the attached patch. I think we should commit this to 9.4. Any objections? Ok for me. Some more propositions about the doc below. The text this patch adds to the documentation needs some rewording, though. Probably. Not sure how to improve. As does this existing paragraph: High rate limit schedule lag values, that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle problem there even if the target rate limit is met in the end. One possible cause of schedule lag is insufficient pgbench threads to handle all of the clients. To improve that, consider reducing the number of clients, increasing the number of threads in pgbench, or running pgbench on a separate host. Another possibility is that the database is not keeping up with the load at some point. When that happens, you will have to reduce the expected transaction rate to lower schedule lag. It took me ages to parse high rate limit schedule lag values. Indeed, I'm not proud of that one... Moreover the first sentence becomes false with the new latency computation, as the lag time is included. I would suggest: When under throttling, the reported lag time measures the delay between the scheduled start time for the transaction and its actual start time. A high value, where the lag time represent most of the transaction latency, may indicate that something is amiss in the throttling process itself, even if the target rate is met in the end. One possible cause ... -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Sun, Aug 17, 2014 at 7:46 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for reviewing the patch! ISTM that I failed to make the patch from my git repository... Attached is the rebased version. I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Thanks for testing the patch! Attached is the updated version of the patch. Previously the patch depended on another infrastructure patch (which allows a user to specify the unit in reloption (*1)). But that infrastructure patch has serious problem and it's not easy to fix the problem. So I changed the patch so that it doesn't depend on that infrastructure patch at all. Even without the infrastructure patch, the feature that this patch introduces is useful. Also I added the regression test into the patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,736 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !literalPENDING_LIST_CLEANUP_SIZE/literal (or !xref linkend=guc-work-mem if not set), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! varnamework_mem/. To avoid fluctuations in observed response time, ! it's desirable to have pending-list cleanup occur in the background ! (i.e., via autovacuum). Foreground cleanup operations can be avoided by ! increasing varnamework_mem/ or making autovacuum more aggressive. ! However, enlarging varnamework_mem/ means that if a foreground ! cleanup does occur, it will take even longer. /para /listitem /varlistentry --- 813,839 /varlistentry varlistentry !termliteralPENDING_LIST_CLEANUP_SIZE/ and !xref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/ ! is used as that threshold, instead). To avoid fluctuations in observed ! response time, it's desirable to have pending-list cleanup occur in the ! background (i.e., via autovacuum). Foreground cleanup operations ! can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/ ! (or varnamework_mem/) or making autovacuum more aggressive. ! However, enlarging the threshold of the cleanup operation means that ! if a foreground cleanup does occur, it will take even longer. ! /para ! para ! literalPENDING_LIST_CLEANUP_SIZE/ is an index storage parameter, ! and allows each GIN index to have its own cleanup threshold. ! For example, it's possible to increase the threshold only for the GIN ! index which can be updated heavily, and decrease it otherwise. /para /listitem /varlistentry *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,377 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term + listitem + para + This setting specifies the maximum size
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On Tue, Sep 9, 2014 at 9:07 PM, didier did...@gmail.com wrote: Hi Well, I'd like to hear someone from the field complaining that pg_receivexlog is thrashing the cache and thus reducing the performance of some other process. Or a least a synthetic test case that demonstrates that happening. It's not with pg_receivexlog but it's related. On a small box without replication server connected perfs were good enough but not so with a replication server connected, there was 1GB worth of WAL sitting in RAM vs next to nothing without slave! After WAL file is filled up and closed, it will not be re-read if wal_level is set to minimal (i.e., neither archiving nor replication is enabled). So, in this case, PostgreSQL advises the OS to release any cached pages of that WAL file. But not if archiving or replication is enabled, and then WAL file keeps being cached even after it's closed. Probably this is the cause of what you observed, I guess. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory Alignment in Postgres
I'm continuously studying Postgres codebase. Hopefully I'll be able to make some contributions in the future. For now I'm intrigued about the extensive use of memory alignment. I'm sure there's some legacy and some architecture that requires it reasoning behind it. That aside, since it wastes space (a lot of space in some cases) there must be a tipping point somewhere. I'm sure one can prove aligned access is faster in a micro-benchmark but I'm not sure it's the case in a DBMS like postgres, specially in the page/rows area. Just for the sake of comparison Mysql COMPACT storage (default and recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed 4-byte alignment. Not sure about Oracle and others. Is it worth the extra space in newer architectures (specially Intel)? Do you guys think this is something worth looking at? I'm trying to messing with the *ALIGN macros but so far I wasn't able to get any conclusive results. My guess is that I'm missing something in the code or pg_bench doesn't stress the difference enough. -- Arthur Silva
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra t...@fuzzy.cz wrote: So I only posted the separate patch for those who want to do a review, and then a complete patch with both parts combined. But it sure may be a bit confusing. Let's do this: post each new version of the patches only on this thread, as a series of patches that can be applied one after another. In ExecChooseHashTableSize(), if buckets_bytes is independent of nbatch, could we just compute it before working out dbatch, and just deduct it from hash_table_bytes? That seems like it'd do the same thing for less work. Well, we can do that. But I really don't think that's going to make measurable difference. And I think the code is more clear this way. Really? It seems to me that you're doing more or less the same calculation that's already been done a second time. It took me 20 minutes of staring to figure out what it was really doing. Now admittedly I was a little low on caffeine, but... Either way, what happens if buckets_bytes is already bigger than hash_table_bytes? Hm, I don't see how that could happen. How about an Assert() and a comment, then? A few more stylistic issues that I see: + if ((hashtable-nbatch == 1) + if (hashtable-nbuckets_optimal = (INT_MAX/2)) + if (size (HASH_CHUNK_SIZE/8)) While I'm all in favor of using parentheses generously where it's needed to add clarity, these and similar usages seem over the top to me. It seemed more readable for me at the time of coding it, and it still seems better this way, but ... https://www.youtube.com/watch?v=CxK_nA2iVXw Heh. Well, at any rate, I think PostgreSQL style is to not use parentheses in that situation. +char * chunk_alloc(HashJoinTable hashtable, int size) { Eh... no. + /* XXX maybe we should use MAXALIGN(size) here ... */ +1. I'm not sure what's the 'no' pointing at? Maybe that the parenthesis should be on the next line? And is the +1 about doing MAXALING? The no is about the fact that you have the return type, the function name, and the opening brace on one line instead of three lines as is project style. The +1 is for applying MAXALIGN to the size. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
Etsuro Fujita wrote: I agree with you on that point. So, I've updated the patch to have the explicit flag, as you proposed. Attached is the updated version of the patch. In this version, I've also revised code and its comments a bit. Thank you, I have set the patch to Ready for Committer. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
On Sun, Sep 7, 2014 at 7:07 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: I think it's probably good to give an FDW the option of producing a ForeignJoinPath for any join against a ForeignPath *or ForeignJoinPath* for the same FDW. It's perhaps unlikely that an FDW can perform a join efficiently between two data sources with different server definitions, but why not give it the option? It should be pretty fast for the FDW to realize, oh, the server OIDs don't match - and at that point it can exit without doing anything further if that seems desirable. And there might be some kinds of data sources where cross-server joins actually can be executed quickly (e.g. when the underlying data is just in two files in different places on the local machine). Indeed how to separate servers is left to users, or author of FDWs, though postgres_fdw and most of other FDWs can join foreign tables in a server. I think it would be good if we can know two foreign tables are managed by same FDW, from FdwRoutine, maybe adding new API which returns FDW identifier? Do we need this? I mean, if you get the FdwRoutine, don't you have the OID of the FDW or the foreign table also? I think so too, so ForeignJoinPath should be able to be an input of another ForeignJoinPath in upper join level. But I also think joining on remote or not should be decided based on cost, as existing joins are planned with bottom-up approach. Definitely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On Wed, Aug 6, 2014 at 3:22 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Maybe we ought to break down and support a real expression syntax. Sounds like that would be better all around. Adding operators is more or less orthogonal with providing a new expression syntax. I'm not sure that there is currently a crying need for it (a syntax). It would be a significant project. It would raise the question where to stop... And I just really need a few more functions/operators which can be fitted in the current implementation quite easily. Writing a simple expression parser for pgbench using flex and bison would not be an inordinately difficult problem. And it would let you add abs() and ?: and thereby avoid the need to hard-code multiple versions of the modulo operator in the patch. The fact that you're agonizing about which modulo to add is a sign that the language is too impoverished to let you do anything non-trivial. That's why C only has one modulo operator: as the patch demonstrates, the differences between the version can be patched over with a very short C expression. If we had real expressions in pgbench, you could do the same thing there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Alignment in Postgres
On Tue, Sep 9, 2014 at 11:08:05AM -0300, Arthur Silva wrote: I'm continuously studying Postgres codebase. Hopefully I'll be able to make some contributions in the future. For now I'm intrigued about the extensive use of memory alignment. I'm sure there's some legacy and some architecture that requires it reasoning behind it. That aside, since it wastes space (a lot of space in some cases) there must be a tipping point somewhere. I'm sure one can prove aligned access is faster in a micro-benchmark but I'm not sure it's the case in a DBMS like postgres, specially in the page/rows area. Just for the sake of comparison Mysql COMPACT storage (default and recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed 4-byte alignment. Not sure about Oracle and others. Is it worth the extra space in newer architectures (specially Intel)? Do you guys think this is something worth looking at? I'm trying to messing with the *ALIGN macros but so far I wasn't able to get any conclusive results. My guess is that I'm missing something in the code or pg_bench doesn't stress the difference enough. Postgres reads data block from disk and puts them in shared memory, then the CPU accesses those values, like floats and integers, as though they were in allocated memory, i.e. we make no adjustments to the data from disk all the way to CPU. I don't think anyone has measured the overhead of doing less alignment, but I would be interested to see any test results produced. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and epoch
On Sat, Sep 6, 2014 at 09:30:06AM -0400, Bruce Momjian wrote: On Fri, Sep 5, 2014 at 07:35:42PM -0400, Bruce Momjian wrote: On Sat, Sep 6, 2014 at 12:26:55AM +0100, Greg Stark wrote: On Wed, Sep 3, 2014 at 3:59 AM, Bruce Momjian br...@momjian.us wrote: I have developed the attached patch which causes pg_upgrade to preserve the transaction epoch. I plan to apply this for PG 9.5. I would say this is a simple bug and should be back patched to 9.4 and 9.3. We're only going to continue to get complaints from people running into this. Yes, I did think about that, but it seems like a behavior change. However, it is tempting to avoid future bug reports about this. When this came up in March, Tom and I agreed that this wasn't something we wanted to slip into 9.4. Given that, it is hard to argue we should now slip this into 9.5, 9.4, and 9.3, so unless someone else votes for inclusion, I think I will leave this as 9.5-only. With no one replying, I will consider this issue closed and not backpatch this. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Thanks. One more tweak --- the whole reason for fiddling with this is to ensure that event triggers support this operation. Therefore this node should be handled by ProcessUtilitySlow, not standard_ProcessUtility, as in the attached patch. [...] I propose this for 9.4 too. Done. I bet they have! Have fun, Thanks! I'm trying to. :) Stephen signature.asc Description: Digital signature
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Robert, Writing a simple expression parser for pgbench using flex and bison would not be an inordinately difficult problem. Sure. Note that there is not only the parsing but also the evaluating to think of, which mean a data structure to represent the expressions which would be more complex than the current one. Although it is not difficult as such, it would mean breaking pgbench heavily, which would mean more time and energy than I have available. And it would let you add abs() and ?: and thereby avoid the need to hard-code multiple versions of the modulo operator in the patch. It would also mean to *implement* abs and ?: in the execution code to compute the parsed expression. The fact that you're agonizing about which modulo to add is a sign that the language is too impoverished to let you do anything non-trivial. I'm not agonizing about which modulo to use:-) I know I do not want the C/SQL version which is never really useful if you have signed data. I overlooked this detail when submitting the first patch, and produced a stupid patch with all 3 possible modulos, before going to the sane solution which is to use the floored division Knuth version. If I had thought a bit, I would have sent v3 as v1 directly. That's why C only has one modulo operator: as the patch demonstrates, the differences between the version can be patched over with a very short C expression. If we had real expressions in pgbench, you could do the same thing there. Sure. I agree that pgbench is limited and that real expressions would have helped, but it is also quite useful and easy to extend as is. Maybe the add an expression parser to pgbench could be added in the postgresql todo wiki? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Tue, Sep 9, 2014 at 11:19 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com: On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. I don't think so showing sort and aggregation is bad idea. Both can have a different performance impacts Sure, showing the sort and aggregation steps is fine. But I don't see what advantage we get out of showing them like this: Aggregate - Sort - ChainAggregate - Sort - ChainAggregate - Sort When we could show them like this: Aggregate - Sort - Sort - Sort From both a display perspective and an implementation-complexity perspective, it seems appealing to have the Aggregate node feed the data to one sort after another, rather having it send the data down a very deep pipe. I might be missing something, of course. I don't want to presume that I'm smarter than Andrew, because Andrew is pretty smart. :-) But it seems odd to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com: On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. I don't think so showing sort and aggregation is bad idea. Both can have a different performance impacts -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problems with config.php and non default ports (postgresql - sugarcrm)
Hello, I've a problem, we're using sugarcrm, and we have a database postgresql, but not in the default port, so, when I try to connect after I put the port in the db_port parameter, but seems like he not recognized, because still pointing to the other instance of the database that have the 5432 port, and I can see in that log when sugar it's trying to connect, even when I change the port. Postgresql 9.3 Sugarcrm 6.5.0 If i put in the db_host_name parameter: ip:port not work either. So, I need to know if exist another file where need to put the port value? I'll appreciate your help. Thanks.
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-09-09 7:55 GMT+02:00 Craig Ringer cr...@2ndquadrant.com: On 09/09/2014 05:20 AM, Robert Haas wrote: I previously proposed RAISE ASSERT ... WHERE, which seems like a nice variant of what we've already got, but perhaps this whole discussion merely illustrates that it's hard to get more than 1 vote for any proposal in this area. Well, you have Petr's for RAISE EXCEPTION ... WHEN, and I'd also like that or RAISE ASSERT ... WHEN. Ada is language with strong character, and PLpgSQL is little bit strange fork - so it isn't easy to find some form, how to solve all requirements. My requests: * be consistent with current PLpgSQL syntax and logic * allow some future extensibility * allow a static analyses without hard expression processing But I am thinking so there are some points where can be some agreement - although it is not ASSERT implementation. enhancing RAISE WHEN - please, see attached patch - I prefer RAISE WHEN again RAISE WHERE due consistency with EXIT and CONTINUE [ WHEN ]; Next we can reserve some SQLCODE for assertation and we can implement it as not handled exception. It is only cancel now, and it is not usable . Probably it should be implement on SQL level - not on plpgsql only. Regards Pavel Much (much) saner than the other proposals on this thread IMO. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 77ff7203d83e889aa9f5190fe54b04dc7c26a5aa Author: Pavel Stehule pavel.steh...@gooddata.com Date: Tue Sep 9 12:11:15 2014 +0200 initial diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 11cb47b..f8ac200 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -2890,8 +2890,20 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt) char *err_datatype = NULL; char *err_table = NULL; char *err_schema = NULL; + ListCell *lc; + if (stmt-cond != NULL) + { + bool value; + bool isnull; + + value = exec_eval_boolean(estate, stmt-cond, isnull); + exec_eval_cleanup(estate); + if (isnull || value == false) + return PLPGSQL_RC_OK; + } + /* RAISE with no parameters: re-throw current exception */ if (stmt-condname == NULL stmt-message == NULL stmt-options == NIL) diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 893f3a4..1f0b861 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -63,6 +63,7 @@ static void current_token_is_not_variable(int tok); static PLpgSQL_expr *read_sql_construct(int until, int until2, int until3, + int until4, const char *expected, const char *sqlstart, bool isexpression, @@ -105,7 +106,7 @@ static void check_labels(const char *start_label, int end_location); static PLpgSQL_expr *read_cursor_args(PLpgSQL_var *cursor, int until, const char *expected); -static List *read_raise_options(void); +static List *read_raise_options(int *endtok); static void check_raise_parameters(PLpgSQL_stmt_raise *stmt); %} @@ -1422,6 +1423,7 @@ for_control : for_variable K_IN expr1 = read_sql_construct(DOT_DOT, K_LOOP, 0, + 0, LOOP, SELECT , true, @@ -1716,6 +1718,7 @@ stmt_raise : K_RAISE { PLpgSQL_stmt_raise *new; int tok; + int endtok; new = palloc(sizeof(PLpgSQL_stmt_raise)); @@ -1726,6 +1729,7 @@ stmt_raise : K_RAISE new-message = NULL; new-params = NIL; new-options = NIL; + new-cond = NULL; tok = yylex(); if (tok == 0) @@ -1796,22 +1800,22 @@ stmt_raise : K_RAISE * or USING to begin the options list. */ tok = yylex(); -if (tok != ',' tok != ';' tok != K_USING) +if (tok != ',' tok != ';' tok != K_USING tok != K_WHEN) yyerror(syntax error); while (tok == ',') { PLpgSQL_expr *expr; - expr = read_sql_construct(',', ';', K_USING, - , or ; or USING, + expr = read_sql_construct(',', ';', K_USING, K_WHEN, + , or ; or USING or WHEN, SELECT , true, true, true, NULL, tok); new-params = lappend(new-params, expr); } } - else if (tok != K_USING) + else if (tok != K_USING tok != K_WHEN) { /* must be condition name or SQLSTATE */ if (tok_is_keyword(tok, yylval, @@ -1847,7 +1851,13 @@ stmt_raise : K_RAISE } if (tok == K_USING) -new-options = read_raise_options(); + { +new-options = read_raise_options(endtok); +if (endtok == K_WHEN) + new-cond = read_sql_expression(';', ;); + } + if (tok == K_WHEN) +new-cond =
Re: [HACKERS] [PATCH] empty xml values
On 8/30/14 12:43 PM, Ali Akbar wrote: While looking into this report http://www.postgresql.org/message-id/cf48ccfb.65a9d%tim.k...@gmail.com I noticed that we don't accept empty values as xml content values, even though this should apparently be allowed by the spec. Attached is a patch to fix it (which needs updates to xml_1.out, which I omit here for simplicity). The patch works, albeit must be applied with --ignore-whitespace. Attached the patch + xml_1.out updates. I'm marking this ready for commit Committed, thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On Tue, Sep 9, 2014 at 11:07 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: The fact that you're agonizing about which modulo to add is a sign that the language is too impoverished to let you do anything non-trivial. I'm not agonizing about which modulo to use:-) I know I do not want the C/SQL version which is never really useful if you have signed data. I overlooked this detail when submitting the first patch, and produced a stupid patch with all 3 possible modulos, before going to the sane solution which is to use the floored division Knuth version. If I had thought a bit, I would have sent v3 as v1 directly. Sure, and I would have looked at that patch and complained that you were implementing a modulo operator with different semantics than C. And then we'd be right back where we are now. The problem here isn't that I don't know what you want, or that what you want is unreasonable. The problem is that we can't cater to every slightly-different thing that somebody wants to do with pgbench. If we try, we'll end up with a neverending jumble of features most of which never get used by 1 or 2 people. Features for any part of PostgreSQL need to be of reasonably general interest, not something that is super-specialized to one particular set of needs. If I commit what you're asking me to commit here, I think that's exactly what I'll be doing, and I don't think that's a good idea. In all seriousness, sometimes the right solution to these problems is just to patch your own copy. I've done that with pgbench at least once and with PostgreSQL in general more times than I can conveniently count. I've learned a lot of useful things that way, but I can't expect all of the debugging instrumentation and trial features that I've created to get incorporated into the product. It's not reasonable, and it's not worth the time it would take me to make the code general and maintainable, so the code just gets dropped on the floor. In a perfect world, that wouldn't happen, but in a perfect world they'd pay me the same salary they pay Linus Torvalds. In this particular instance, we got onto this topic in the first place because of the Gaussian and exponential patches, and the desire to have the hot portion of the keys distributed in some random way through the data set rather than having everything be clustered at the front. As you yourself said, the best case for this patch is to allow a poor-man's approximation of that. Adding a weird non-standard operator for a poor approximation of the feature we're really looking for just doesn't feel like the right call. I recognize that you have a different view, and if enough people agree with you, that argument may win the day. But my opinion is that we are too far down in the weeds. We should be trying to create features that will have general appeal to pgbench users, not features that solve narrow problems for individual developers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert == Robert Haas robertmh...@gmail.com writes: Robert Sure, showing the sort and aggregation steps is fine. But I Robert don't see what advantage we get out of showing them like Robert this: Robert Aggregate Robert - Sort Robert- ChainAggregate Robert - Sort Robert - ChainAggregate Robert - Sort The advantage is that this is how the plan tree is actually structured. Robert When we could show them like this: Robert Aggregate Robert - Sort Robert - Sort Robert - Sort And we can't structure the plan tree like this, because then it wouldn't be a _tree_ any more. The Sort node expects to have a child node to fetch rows from, and it expects all the usual plan tree mechanics (initialization, rescan, etc.) to work on that child node. There's no way for the parent to feed data to the child. Robert From both a display perspective and an Robert implementation-complexity perspective, ... says the person who has never tried implementing it. Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problems with config.php and non default ports (postgresql - sugarcrm)
On 09/08/2014 05:27 PM, Bianca Santana Espichicoquez wrote: Hello, I've a problem, we're using sugarcrm, and we have a database postgresql, but not in the default port, so, when I try to connect after I put the port in the db_port parameter, but seems like he not recognized, because still pointing to the other instance of the database that have the 5432 port, and I can see in that log when sugar it's trying to connect, even when I change the port. Postgresql 9.3 Sugarcrm 6.5.0 If i put in the db_host_name parameter: ip:port not work either. So, I need to know if exist another file where need to put the port value? I'll appreciate your help. Thanks. Please post on the correct list. Neither pgsql-advocacy nor pgsql-hackers is the correct list. Either a Sugarcrm list or pgsql-general might be the correct forum. Also, cross-posting on more than one postgresql list is generally frowned on. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract
On Mon, Sep 8, 2014 at 6:20 PM, David G Johnston david.g.johns...@gmail.com wrote: On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] [hidden email] wrote: On Thu, Sep 4, 2014 at 6:38 PM, David Johnston [hidden email] wrote: One of the trade-offs I mentioned...its more style than anything but removing the parenthetical (if there is not error in the command) and writing it more directly seemed preferable in an overview such as this. Better: The function will either throw an error or return a PGresult object[...] Nope. This is not C++, nor is it the backend. It will not throw anything. The current sentence reads: The response to this (if there is no error in the command) will be a PGresult object bearing a status code of PGRES_COPY_OUT or PGRES_COPY_IN (depending on the specified copy direction). Maybe this is taken for granted by others, and thus can be excluded here, but I'm trying to specify what happens if there is an error in the command... Apparently one does not get back a PGresult object... Well, there's the point of confusion, because the function ALWAYS returns a PGresult, except maybe in some obscure corner case where it can't allocate one and therefore returns NULL. What differs is the result status. Ignoring style and such did anything I wrote help to clarify your understanding of why pgPutCopyEnd does what it does? As I say this and start to try and read the C code I think that it is a good source for understanding novice assumptions but there is a gap between the docs and the code - though I'm not sure you've identified the only (or even proper) location. Honestly, not really. I thought the language I previously discussed with Tom was adequate for that; and I'm a little confused as to why we've gotten on what seems to be somewhat of a digression into nearby but otherwise-unrelated documentation issues. Unlike PQputCopyData there is no explicit logic (pqIsnonblocking(conn) in PQputCopyEnd for non-blocking mode (there is an implicit one via pqFlush). This does seem like an oversight - if a minor one since the likihood of not being able to add the EOF to the connection's buffer seems highly unlikely - but I would expect on the basis of symmetry alone - for both of them to have buffer filled testing logic. And, depending on how large *errormsg is, the risk of being unable to place the data in the buffer isn't as small and the expected EOF case. Yeah, I think that's a bug in PQputCopyEnd(). It should have the same kind of pqCheckOutBufferSpace() check that PQputCopyData() does. I'm getting way beyond my knowledge level here but my assumption from the documentation was that the async mode behavior of returning zero revolved around retrying in order to place the data into the buffer - not retrying to make sure the buffer is empty. The caller deals with that on their own based upon their blocking mode decision. Though we would want to call pqFlush for blocking mode callers here since this function should block until the buffer is clear. PQputCopyEnd() already does flush; and PQputCopyData() shouldn't flush. So, I thought I agreed with your suggestion that if the final pqFlush returns a 1 that pqPutCopyEnd should return 0. Well, Tom set me straight on that; so I don't think we're considering any such change. I think you need to make sure you understand the previous discussion in detail before proposing how to adjust the documentation (or the code). Additionally, if in non-blocking mode, and especially if *errormsg is not NULL, the available connection buffer length logic in pqPutCopyData should be evaluated here as well. Yes. See the comment three up from this one. The most useful and compatible solution is to make pqPutCopyEnd synchronous regardless of the user selected blocking mode - which it mostly is now but the final pqFlush should be in a loop - and adjust the documentation in the areas noted in my patch-email to accommodate that fact. Uh, no. If you do that, you'll break the code that I spent so much time writing, which led to my original post. I wouldn't be using non-blocking mode if it were OK for it to block. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert Haas robertmh...@gmail.com writes: On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. I'm less than convinced that it's true ... I've been meaning to find time to review this patch, but it sounds like it's getting to the point where I need to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On 2014-08-28 13:54:28 +0200, Andres Freund wrote: On 2014-08-28 13:20:07 +0200, Andres Freund wrote: I've attached a incremental patch. Apparently I didn't attach the patch, as so much a file containing the name of the patchfile... Which you obviously didn't integrate. And didn't comment on. Grr. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Robert == Robert Haas robertmh...@gmail.com writes: Robert Sure, showing the sort and aggregation steps is fine. But I Robert don't see what advantage we get out of showing them like Robert this: Robert Aggregate Robert - Sort Robert- ChainAggregate Robert - Sort Robert - ChainAggregate Robert - Sort The advantage is that this is how the plan tree is actually structured. I do understand that. I am questioning (as I believe Heikki was also) whether it's structured correctly. Nobody is arguing for displaying the plan tree in a way that doesn't mirror it's actual structure, or at least I am not. The Sort node expects to have a child node to fetch rows from, and it expects all the usual plan tree mechanics (initialization, rescan, etc.) to work on that child node. There's no way for the parent to feed data to the child. OK, good point. So we do need something that can feed data from one part of the plan tree to another, like a CTE does. I still think it would be worth trying to see if there's a reasonable way to structure the plan tree so that it's flatter. Robert From both a display perspective and an Robert implementation-complexity perspective, ... says the person who has never tried implementing it. This comment to me reads rather sharply, and I don't feel that the tone of my email is such as to merit a rebuke. Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
On Tue, Sep 9, 2014 at 2:57 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should work with inheritance. So the path forward is (using Marko's phrasing upthread): 1) We put the LIMIT inside ModifyTable like this patch does. This doesn't prevent us from doing ORDER BY in the future, but helps numerous people who today have to 2) Someone rewrites how UPDATE works based on Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us, which allows us to support ORDER BY on all tables (or perhaps maybe not FDWs, I don't know how those work). The LIMIT functionality in this patch is unaffected. What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough? I've wanted LIMIT even without ORDER BY many times, so I'd vote yes. And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are yes and not much, then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first. On that I can't give any useful feedback. Cheers, Jeff
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 29/07/14 18:51, Robert Haas wrote: On Mon, Jul 28, 2014 at 1:50 PM, Andres Freund and...@2ndquadrant.com wrote: What I'm thinking of is providing an actual API for the writes instead of hooking into the socket API in a couple places. I.e. have something like typedef struct DestIO DestIO; struct DestIO { void (*flush)(struct DestIO *io); int (*putbytes)(struct DestIO *io, const char *s, size_t len); int (*getbytes)(struct DestIO *io, const char *s, size_t len); ... } and do everything through it. I haven't thought much about the specific API we want, but abstracting the communication properly instead of adding hooks here and there is imo much more likely to succeed in the long term. This sounds suspiciously like the DestReceiver thing we've already got, except that the DestReceiver only applies to tuple results, not errors and notices and so on. I'm not totally unamenable to a bigger refactoring here, but right now it looks to me like a solution in search of a problem. The hooks are simple and seem to work fine; I don't want to add notation for its own sake. I am not sure if what Andres proposed is the right solution, but I do agree here that the hook definitely isn't. I am also not sure that I like the pq_redirect_to_shm_mq being directly exposed for use in bgworker, what I would like is to have elog interface to which you tell that you want errors sent to shm_mq handle and that one would then do the necessary calls (It's basically same principle but for the sake of cleanliness/correctness I think it should be elog and not pq from the point of bgworker). So the interface needs work. I do agree that we don't need two way communication over this channel, I think the dsm_mq can be used directly quite well for generic usecase. Otherwise I like the patch, and I am glad you also made the GUC save/restore infrastructure. Please don't forget to add this to next commitfest. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Tom == Tom Lane t...@sss.pgh.pa.us writes: Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. Tom I'm less than convinced that it's true ... Maybe you can figure out how, but I certainly didn't see a reasonable way. I would also question one aspect of the desirability - using the CTE mechanism has the downside of needing an extra tuplestore with the full input data set in it, whereas the chain mechanism only has aggregated data in its tuplestore which should be much smaller. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add shutdown_at_recovery_target option to recovery.conf
Hi, I recently wanted several times to have slave server prepared at certain point in time to reduce the time it takes for it to replay remaining WALs (say I have pg_basebackup -x on busy db for example). Currently the way to do it is to have pause_at_recovery_target true (default) and wait until pg accepts connection and the shut it down. The issue is that this is ugly, and also there is a chance that somebody else connects and does bad things (tm) before my process does. So I wrote simple patch that adds option to shut down the cluster once recovery_target is reached. The server will still be able to continue WAL replay if needed later or can be configured to start standalone. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0f1ff34..278bbaa 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -309,6 +309,36 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /varlistentry /variablelist + + varlistentry id=shutdown-at-recovery-target + xreflabel=shutdown_at_recovery_target + termvarnameshutdown_at_recovery_target/varname (typeboolean/type) + indexterm +primaryvarnameshutdown_at_recovery_target/ recovery parameter/primary + /indexterm + /term + listitem + para +Specifies whether postgres should shutdown the once the recovery target +is reached. The default is false. +This is intended to have instance ready at exact replay point desired. +The instance will still be able to replay more WAL records. +Note that because reconvery.conf will not be removed if this option is +set to true, the above means that unless you change your configuration, +any subsequent start will end with immediate shutdown. + /para + para +This setting has no effect if xref linkend=guc-hot-standby is not +enabled, or if no recovery target is set. + /para + para +Setting this to true will set link linkend=pause-at-recovery-target +varnamepause_at_recovery_target//link to false. + /para + /listitem + /varlistentry + + /variablelist /sect1 sect1 id=standby-settings diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 34f2fc0..bbe3987 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -222,6 +222,7 @@ static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; static bool recoveryTargetInclusive = true; static bool recoveryPauseAtTarget = true; +static bool recoveryShutdownAtTarget = false; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static char *recoveryTargetName; @@ -5159,6 +5160,18 @@ readRecoveryCommandFile(void) (errmsg_internal(pause_at_recovery_target = '%s', item-value))); } + else if (strcmp(item-name, shutdown_at_recovery_target) == 0) + { + if (!parse_bool(item-value, recoveryShutdownAtTarget)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a Boolean value, shutdown_at_recovery_target))); + ereport(DEBUG2, + (errmsg_internal(shutdown_at_recovery_target = '%s', + item-value))); + if (recoveryShutdownAtTarget) +recoveryPauseAtTarget = false; + } else if (strcmp(item-name, recovery_target_timeline) == 0) { rtliGiven = true; @@ -6907,6 +6920,24 @@ StartupXLOG(void) ereport(LOG, (errmsg(last completed transaction was at log time %s, timestamptz_to_str(xtime; + + /* + * Shutdown here when requested. We need to exit this early because + * we want to be able to continue the WAL replay when started + * the next time. + */ + if (recoveryShutdownAtTarget reachedStopPoint) + { +ereport(LOG, (errmsg(shutting down))); +/* + * Note that we exit with status 2 instead of 0 here to force + * postmaster shutdown the whole instance. This produces one + * unnecessary log line, but we don't have better way to + * shutdown postmaster from within single backend. + */ +proc_exit(2); + } + InRedo = false; } else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract
On Tue, Sep 9, 2014 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Sep 8, 2014 at 6:20 PM, David G Johnston david.g.johns...@gmail.com wrote: On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] [hidden email] wrote: On Thu, Sep 4, 2014 at 6:38 PM, David Johnston [hidden email] wrote: Ignoring style and such did anything I wrote help to clarify your understanding of why pgPutCopyEnd does what it does? As I say this and start to try and read the C code I think that it is a good source for understanding novice assumptions but there is a gap between the docs and the code - though I'm not sure you've identified the only (or even proper) location. Honestly, not really. I thought the language I previously discussed with Tom was adequate for that; and I'm a little confused as to why we've gotten on what seems to be somewhat of a digression into nearby but otherwise-unrelated documentation issues. ​My theory here is that if you discover a single point of confusion that cannot by attributed to a typo (or something basic like that) then why not go looking around and see if there are any other issues in nearby code/documentation. Furthermore, not being the writer of said code/documentation, ​a fresh perspective of the entire body of work - and not just the few lines you quoted - has value. For me, if nothing else I took it as a chance to learn and evaluate the status quo. And as noted below there are a couple of items to address in the nearby code even if the documentation itself is accurate - I just took a very indirect route to discover them. Much of the current documentation is copy-and-pasted directly from the source code - with a few comments tacked on for clarity. I'm sure some layout and style concerns were present during the copy-and-paste but the user-facing documentation does not, and probably should not ideally, directly emulate the source code comments. Admittedly, in the libpq section this can be considerably relaxed since the target user is likely a C programmer. Unlike PQputCopyData there is no explicit logic (pqIsnonblocking(conn) in PQputCopyEnd for non-blocking mode (there is an implicit one via pqFlush). This does seem like an oversight - if a minor one since the likihood of not being able to add the EOF to the connection's buffer seems highly unlikely - but I would expect on the basis of symmetry alone - for both of them to have buffer filled testing logic. And, depending on how large *errormsg is, the risk of being unable to place the data in the buffer isn't as small and the expected EOF case. Yeah, I think that's a bug in PQputCopyEnd(). It should have the same kind of pqCheckOutBufferSpace() check that PQputCopyData() does. ​Anybody disagree here?​ ​ I'm getting way beyond my knowledge level here but my assumption from the documentation was that the async mode behavior of returning zero revolved around retrying in order to place the data into the buffer - not retrying to make sure the buffer is empty. The caller deals with that on their own based upon their blocking mode decision. Though we would want to call pqFlush for blocking mode callers here since this function should block until the buffer is clear. PQputCopyEnd() already does flush; and PQputCopyData() shouldn't flush. ​My point being pgFlush is sensitive to whether the caller is in blocking mode. Also, pgPutCopyData DOES FLUSH if the buffer is full...then attempts to resize the buffer...then returns 0 (or -1 in blocking mode). How absolute is Tom's Well, we certainly do NOT want a flush in PQputCopyData. ? So, I thought I agreed with your suggestion that if the final pqFlush returns a 1 that pqPutCopyEnd should return 0. Well, Tom set me straight on that; so I don't think we're considering any such change. I think you need to make sure you understand the previous discussion in detail before proposing how to adjust the documentation (or the code). ​That was largely what this exercise was about for me...having gone through all this and now re-reading Tom's wording I do understand and agree. For surround documentation ​I think we are good since pqPutCopyData already tests the buffer and correctly returns 0 in non-blocking mode. The issue with pqPutCopyEnd seems to be a copy/paste error and an incorrect assumption regarding non-blocking mode. This doesn't address the usage bug we've been propagating where someone very well might use non-blocking mode and, upon seeing a return value of 1 from pqPutCopyEnd, immediately call pqGetResult even if the buffer is not completed flushed. Our proposed documentation directs people to always pgFlush poll after calling pqPutCopyEnd while the current documentation is not so strict. Is there anything to say/do about that fact or - more importantly - is there any real risk here? ​David J. ​
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Tue, Sep 9, 2014 at 12:33 PM, Petr Jelinek p...@2ndquadrant.com wrote: I am not sure if what Andres proposed is the right solution, but I do agree here that the hook definitely isn't. Well, that doesn't get me very far. Andres never responded to my previous comments about why I did it that way, and you're not proposing a clear alternative that I can either implement or comment on. I am also not sure that I like the pq_redirect_to_shm_mq being directly exposed for use in bgworker, what I would like is to have elog interface to which you tell that you want errors sent to shm_mq handle and that one would then do the necessary calls (It's basically same principle but for the sake of cleanliness/correctness I think it should be elog and not pq from the point of bgworker). I think that's completely wrong. As the patch series demonstrates, it's not limited to propagating ErrorResponse and NoticeResponse. It can also propagate NotifyResponse and RowDescription and DataRow and anything else that comes along. We are not just propagating errors; we are propagating all protocol messages of whatever type. So tying it to elog specifically is not right. So the interface needs work. I do agree that we don't need two way communication over this channel, I think the dsm_mq can be used directly quite well for generic usecase. I'm glad you agree, but that leaves me baffled as to what's wrong with the hook approach. There's no crying need for extensibility in this code; it's barely changed in a very long time, and I know of no other need which might soon require changing it again. The hook is very lightweight and shouldn't materially affect performance when it's not used, as a more complex approach might. Otherwise I like the patch, and I am glad you also made the GUC save/restore infrastructure. Cool. Please don't forget to add this to next commitfest. OK, done. But it would be awfully nice if we could actually make some progress on hammering out a design everyone can live with here. Letting it sit for another 5 weeks is not going to help anything, and it will hold up getting any more stuff after this done in time for 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
Hi, I worked bit on this patch to make it closer to committable state. There are several bugs fixed, including ones mentioned by Jamie (writing WAL during recovery). Also support for pg_resetxlog/pg_upgrade has been implemented by Andres. I added simple regression test and regression contrib module to cover both off and on settings. The SLRU issue Heikki mentioned should be also gone mainly thanks to 638cf09e7 (I did test it too). One notable thing missing is documentation for the three SQL level interfaces provided, I plan to add that soon. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c index 3b8241b..f0a023f 100644 --- a/contrib/pg_upgrade/pg_upgrade.c +++ b/contrib/pg_upgrade/pg_upgrade.c @@ -423,8 +423,10 @@ copy_clog_xlog_xid(void) /* set the next transaction id and epoch of the new cluster */ prep_status(Setting next transaction ID and epoch for new cluster); exec_prog(UTILITY_LOG_FILE, NULL, true, - \%s/pg_resetxlog\ -f -x %u \%s\, - new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, + \%s/pg_resetxlog\ -f -x %u -c %u \%s\, + new_cluster.bindir, + old_cluster.controldata.chkpnt_nxtxid, + old_cluster.controldata.chkpnt_nxtxid, new_cluster.pgdata); exec_prog(UTILITY_LOG_FILE, NULL, true, \%s/pg_resetxlog\ -f -e %u \%s\, diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c index cbcaaa6..81cbcaf 100644 --- a/contrib/pg_xlogdump/rmgrdesc.c +++ b/contrib/pg_xlogdump/rmgrdesc.c @@ -9,6 +9,7 @@ #include postgres.h #include access/clog.h +#include access/committs.h #include access/gin.h #include access/gist_private.h #include access/hash.h diff --git a/contrib/test_committs/.gitignore b/contrib/test_committs/.gitignore new file mode 100644 index 000..1f95503 --- /dev/null +++ b/contrib/test_committs/.gitignore @@ -0,0 +1,5 @@ +# Generated subdirectories +/log/ +/isolation_output/ +/regression_output/ +/tmp_check/ diff --git a/contrib/test_committs/Makefile b/contrib/test_committs/Makefile new file mode 100644 index 000..2240749 --- /dev/null +++ b/contrib/test_committs/Makefile @@ -0,0 +1,45 @@ +# Note: because we don't tell the Makefile there are any regression tests, +# we have to clean those result files explicitly +EXTRA_CLEAN = $(pg_regress_clean_files) ./regression_output ./isolation_output + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/test_committs +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +# We can't support installcheck because normally installcheck users don't have +# the required track_commit_timestamp on +installcheck:; + +check: regresscheck + +submake-regress: + $(MAKE) -C $(top_builddir)/src/test/regress all + +submake-test_committs: + $(MAKE) -C $(top_builddir)/contrib/test_committs + +REGRESSCHECKS=committs_on + +regresscheck: all | submake-regress submake-test_committs + $(MKDIR_P) regression_output + $(pg_regress_check) \ + --temp-config $(top_srcdir)/contrib/test_committs/committs.conf \ + --temp-install=./tmp_check \ + --extra-install=contrib/test_committs \ + --outputdir=./regression_output \ + $(REGRESSCHECKS) + +regresscheck-install-force: | submake-regress submake-test_committs + $(pg_regress_installcheck) \ + --extra-install=contrib/test_committs \ + $(REGRESSCHECKS) + +PHONY: submake-test_committs submake-regress check \ + regresscheck regresscheck-install-force \ No newline at end of file diff --git a/contrib/test_committs/committs.conf b/contrib/test_committs/committs.conf new file mode 100644 index 000..d221a60 --- /dev/null +++ b/contrib/test_committs/committs.conf @@ -0,0 +1 @@ +track_commit_timestamp = on \ No newline at end of file diff --git a/contrib/test_committs/expected/committs_on.out b/contrib/test_committs/expected/committs_on.out new file mode 100644 index 000..9920343 --- /dev/null +++ b/contrib/test_committs/expected/committs_on.out @@ -0,0 +1,21 @@ +-- +-- Commit Timestamp (on) +-- +CREATE TABLE committs_test(id serial, ts timestamptz default now()); +INSERT INTO committs_test DEFAULT VALUES; +INSERT INTO committs_test DEFAULT VALUES; +INSERT INTO committs_test DEFAULT VALUES; +SELECT id, pg_get_transaction_extradata(xmin), + pg_get_transaction_committime(xmin) = ts, + pg_get_transaction_committime(xmin) now(), + pg_get_transaction_committime(xmin) - ts '60s' -- 60s should give a lot of reserve +FROM committs_test +ORDER BY id; + id | pg_get_transaction_extradata | ?column? | ?column? | ?column? ++--+--+--+-- + 1 |0 | t| t| t + 2 |0 | t| t| t + 3 |
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 09/09/14 18:49, Robert Haas wrote: On Tue, Sep 9, 2014 at 12:33 PM, Petr Jelinek p...@2ndquadrant.com wrote: I am also not sure that I like the pq_redirect_to_shm_mq being directly exposed for use in bgworker, what I would like is to have elog interface to which you tell that you want errors sent to shm_mq handle and that one would then do the necessary calls (It's basically same principle but for the sake of cleanliness/correctness I think it should be elog and not pq from the point of bgworker). I think that's completely wrong. As the patch series demonstrates, it's not limited to propagating ErrorResponse and NoticeResponse. It can also propagate NotifyResponse and RowDescription and DataRow and anything else that comes along. We are not just propagating errors; we are propagating all protocol messages of whatever type. So tying it to elog specifically is not right. Oh in that case, I think what Andres proposed is actually quite good. I know the hook works fine it just seems like using somewhat hackish solution to save 20 lines of code. The reason why I am not proposing anything better is that my solution when I did prototyping in this area was to just check if my pq_dsm_mq handle is set or not and call the appropriate function from the wrapper based on that which does not seem like big improvement over the hook approach... Anything better/more flexible that I could come up with is basically same idea what Andres already wrote. Please don't forget to add this to next commitfest. OK, done. But it would be awfully nice if we could actually make some progress on hammering out a design everyone can live with here. Letting it sit for another 5 weeks is not going to help anything, and it will hold up getting any more stuff after this done in time for 9.5. Yeah I said that just as formality, I wrote the response now and not in 5 weeks exactly for this reason, I am all for discussing this now and I think it's important to hammer this infrastructure out sooner rather than later. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Mon, Sep 8, 2014 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote: This has been pending for almost two months now and, at your request, my patch to make spinlocks act as compiler barriers is waiting behind it. Can we please get this moving again soon, or can I commit that patch and you can fix this when you get around to it? I finally pushed this. And once more I seriously got pissed at the poor overall worldwide state of documentation and continously changing terminology around this. There does seem to be a deficit in that area. Sorry for taking this long :( Do you have a current version of your patch to make them compiler barriers? I had forgotten that it needed an update. Thanks for the reminder. Here's v2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index efe1b43..38dc34d 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -154,6 +154,17 @@ s_lock(volatile slock_t *lock, const char *file, int line) return delays; } +#ifdef USE_DEFAULT_S_UNLOCK +void +s_unlock(slock_t *lock) +{ +#ifdef TAS_ACTIVE_WORD + *TAS_ACTIVE_WORD(lock) = -1; +#else + *lock = 0; +#endif +} +#endif /* * Set local copy of spins_per_delay during backend startup. diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 06dc963..8e0c4c3 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -55,14 +55,16 @@ * on Alpha TAS() will fail if interrupted. Therefore a retry loop must * always be used, even if you are certain the lock is free. * - * Another caution for users of these macros is that it is the caller's - * responsibility to ensure that the compiler doesn't re-order accesses - * to shared memory to precede the actual lock acquisition, or follow the - * lock release. Typically we handle this by using volatile-qualified - * pointers to refer to both the spinlock itself and the shared data - * structure being accessed within the spinlocked critical section. - * That fixes it because compilers are not allowed to re-order accesses - * to volatile objects relative to other such accesses. + * It is the responsibility of these macros to make sure that the compiler + * does not re-order accesses to shared memory to precede the actual lock + * acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this + * was the caller's responsibility, which meant that callers had to use + * volatile-qualified pointers to refer to both the spinlock itself and the + * shared data being accessed within the spinlocked critical section. This + * was notationally awkward, easy to forget (and thus error-prone), and + * prevented some useful compiler optimizations. For these reasons, we + * now require that the macros themselves prevent compiler re-ordering, + * so that the caller doesn't need to take special precautions. * * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and * S_UNLOCK() macros must further include hardware-level memory fence @@ -484,14 +486,14 @@ tas(volatile slock_t *lock) #define S_UNLOCK(lock) \ do \ { \ - __asm__ __volatile__ ( lwsync \n); \ + __asm__ __volatile__ ( lwsync \n ::: memory); \ *((volatile slock_t *) (lock)) = 0; \ } while (0) #else #define S_UNLOCK(lock) \ do \ { \ - __asm__ __volatile__ ( sync \n); \ + __asm__ __volatile__ ( sync \n ::: memory); \ *((volatile slock_t *) (lock)) = 0; \ } while (0) #endif /* USE_PPC_LWSYNC */ @@ -599,7 +601,9 @@ do \ .set noreorder \n \ .set nomacro\n \ sync\n \ - .set pop ); \ + .set pop +: +: memory); *((volatile slock_t *) (lock)) = 0; \ } while (0) @@ -657,6 +661,23 @@ tas(volatile slock_t *lock) typedef unsigned char slock_t; #endif +/* + * Note that this implementation is unsafe for any platform that can speculate + * a memory access (either load or store) after a following store. That + * happens not to be possible x86 and most legacy architectures (some are + * single-processor!), but many modern systems have weaker memory ordering. + * Those that do must define their own version S_UNLOCK() rather than relying + * on this one. + */ +#if !defined(S_UNLOCK) +#if defined(__INTEL_COMPILER) +#define S_UNLOCK(lock) \ + do { __memory_barrier(); *(lock) = 0; } while (0) +#else +#define S_UNLOCK(lock) \ + do { __asm__ __volatile__( : : : memory); *(lock) = 0; } while (0) +#endif +#endif #endif /* defined(__GNUC__) || defined(__INTEL_COMPILER) */ @@ -730,9 +751,13 @@ tas(volatile slock_t *lock) return (lockval == 0); } -#endif /* __GNUC__ */ +#define S_UNLOCK(lock) \ + do { \ + __asm__ __volatile__( : : : memory); \ + *TAS_ACTIVE_WORD(lock) = -1; \ + } while (0) -#define S_UNLOCK(lock) (*TAS_ACTIVE_WORD(lock) = -1) +#endif /* __GNUC__ */ #define
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On Tue, Sep 9, 2014 at 8:07 AM, didier did...@gmail.com wrote: Well, I'd like to hear someone from the field complaining that pg_receivexlog is thrashing the cache and thus reducing the performance of some other process. Or a least a synthetic test case that demonstrates that happening. It's not with pg_receivexlog but it's related. On a small box without replication server connected perfs were good enough but not so with a replication server connected, there was 1GB worth of WAL sitting in RAM vs next to nothing without slave! setup: 8GB RAM 2GB shared_buffers (smaller has other issues) checkpoint_segments 40 (smaller value trigger too much xlog checkpoint) checkpoints spread over 10 mn and write 30 to 50% of shared buffers. live data set fit in RAM. constant load. On startup (1 or 2/hour) applications were running requests on cold data which were now saturating IO. I'm not sure it's an OS bug as the WAL were 'hotter' than the cold data. A cron task every minute with vmtouch -e for evicting old WAL files from memory has solved the issue. That seems like pretty good evidence that it might be worth doing something here. But I still think maybe it should be optional, because if the user plans to reread those files and, say, copy them somewhere else, then they won't want this behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in binning functions
Petr Jelinek p...@2ndquadrant.com writes: On 07/09/14 21:09, Andres Freund wrote: On 2014-09-07 15:05:35 -0400, Tom Lane wrote: I think the main remaining issue is that we don't have consensus on the function name AFAICT. I'm okay with using width_bucket(), as is done in the latest patch, but there were objections ... Just reread that part of the thread and personally I disliked all the other suggested names more than width_bucket. Same here, that's why I didn't change it. Not hearing any further discussion, I committed it with that name (and a bit of further cleanup). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
Hi, Given we already have three topics for --help and I can see others I went with my --help= proposal. On 2014-08-28 13:20:07 +0200, Andres Freund wrote: Some stuff I changed: * I rephrased the sgml changes * s/Printing options/Display options/. Or maybe Display influencing variables? That makes it clearer why they're listed under --help-variables. * I added \? commands as an alias for a plain \? That way the scheme can sensibly be expanded. * I renamed help_variables() to be inline with the surrounding functions. I integrated all those ontop of your help-variables-12.patch. Then I: * re-added psql -i which you, probably accidentally, removed * rephrased the sgml stuff * removed the Report bugs to from helpVariables() * Changed things so that both --help and \? support commands, options and variables as help topics * fixed things so that --help=wrongoption returns a proper exit code again I attached both the full and the incremental diff. I'd appreciate somebody looking over the the docs... I plan to push this soon. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 9fcf3f0983ccd617c20dc4e87889a894d873b92e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 9 Sep 2014 22:19:14 +0200 Subject: [PATCH] Add new psql help topics, accessible to both --help and \?. Add --help=topic for the commandline, and \? topic as a backslash command, to show more help than the invocations without parameters do. commands, variables and options currently exist as help topics describing, respectively, backslash commands, psql variables, and commandline switches. Without parameters the help commands show their previous topic. Author: Pavel Stehule, editorialized by many Reviewed-By: Andres Freund, Petr Jelinek, Fujii Masao, MauMau, Abhijit Menon-Sen and Erik Rijkers. Discussion: CAFj8pRDVGuC-nXBfe2CK8vpyzd2Dsr9GVpbrATAnZO=2yq0...@mail.gmail.com, cafj8pra54abtv2rxdtrxiad8hy8wxmovlqhjdrcwenhdd7o...@mail.gmail.com --- doc/src/sgml/ref/psql-ref.sgml | 23 - src/bin/psql/command.c | 14 ++- src/bin/psql/help.c| 210 +++-- src/bin/psql/help.h| 4 +- src/bin/psql/startup.c | 29 -- src/bin/psql/tab-complete.c| 7 ++ 6 files changed, 224 insertions(+), 63 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 29ad1aa..aa71674 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -560,11 +560,17 @@ EOF varlistentry termoption-?//term - termoption--help//term + termoption--help[=replaceable class=parametertopic/]/option/term listitem para - Show help about applicationpsql/application command line - arguments, and exit. + Show help about applicationpsql/application and exit. The optional + replaceable class=parametertopic/ parameter (defaulting + to literaloptions/literal) selects which part of psql is + explained: literalcommands/ describes applicationpsql/'s + backslash commands; literaloptions/ describes the commandline + switches that can be passed to applicationpsql/; + and literalvariables/ shows help about about psql configuration + variables. /para /listitem /varlistentry @@ -2574,10 +2580,17 @@ testdb=gt; userinput\setenv LESS -imx4F/userinput varlistentry -termliteral\?/literal/term +termliteral\? [ replaceable class=parametertopic/ ]/literal/term listitem para -Shows help information about the backslash commands. +Shows help information. The optional +replaceable class=parametertopic/ parameter +(defaulting to literalcommands/) selects which part of psql is +explained: literalcommands/ describes applicationpsql/'s +backslash commands; literaloptions/ describes the commandline +switches that can be passed to applicationpsql/; +and literalvariables/ shows help about about psql configuration +variables. /para /listitem /varlistentry diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 39b5777..5d90ca2 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1491,7 +1491,19 @@ exec_command(const char *cmd, /* \? -- slash command help */ else if (strcmp(cmd, ?) == 0) - slashUsage(pset.popt.topt.pager); + { + char *opt0 = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + if (!opt0 || strcmp(opt0, commands) == 0) + slashUsage(pset.popt.topt.pager); + else if (strcmp(opt0, options) == 0) + usage(pset.popt.topt.pager); + else if (strcmp(opt0, variables) == 0) + helpVariables(pset.popt.topt.pager); + else + slashUsage(pset.popt.topt.pager); + } #if 0 diff --git a/src/bin/psql/help.c
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek p...@2ndquadrant.com wrote: I think that's completely wrong. As the patch series demonstrates, it's not limited to propagating ErrorResponse and NoticeResponse. It can also propagate NotifyResponse and RowDescription and DataRow and anything else that comes along. We are not just propagating errors; we are propagating all protocol messages of whatever type. So tying it to elog specifically is not right. Oh in that case, I think what Andres proposed is actually quite good. I know the hook works fine it just seems like using somewhat hackish solution to save 20 lines of code. If it's 20 lines of code, I'm probably fine to go that way. Let's see if we can figure out what those 20 lines look like. libpq.h exports 29 functions that do a variety of different things. Of those, 20 are in pqcomm.c and the others are in be-secure.c. I presume that pluggability for the latter group, if needed at all, is a separate project. The remaining ones break down like this: - StreamServerPort, StreamConnection, StreamClose, and TouchSocketFiles are intended to be called only from the postmaster, to set up and tear down the listening socket and individual connections. Presumably this is not what we care about here. - pq_getbytes(), pq_getstring(), pq_getmessage(), pq_getbyte(), pq_peekbyte(), and pq_getbyte_if_available() handle *reading* data from the socket. Since you previously agreed that we didn't need to build two-way communication on top of this, I would thank that would mean that these don't need to be pluggable either. But maybe I'm wrong. - The remaining functions are pq_init(), pq_comm_reset(), pq_flush(), pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(), pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout(). These are the ones that I think are potentially interesting. I didn't choose to provide hooks for all of these in the submitted patch because they're not all needed for I want to do here: pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol support, which did not interest me (nor did COPY, really); pq_putmessage_noblock(), pq_flush_if_writable(), and pq_is_send_pending() are only used for the walsender protocol, which doesn't seem useful to redirect to a non-socket; and I just didn't happen to have any use for pq_init() or pq_comm_reset(). Hence what I ended up with. But, I could revisit that. Suppose I provide a structure with 10 function pointers for all ten of those functions, or maybe 9 since pq_init() is called so early that it's not likely we'll have control to put the hooks in place before that point, and anyway whatever code installs the hooks can do its own initialization then. Then make a global variable like pqSendMethods and #define pq_comm_reset() to be pqSendMethods-comm_reset(), pflush() to be pqSendMethods-flush(), and so on for all 9 or 10 methods. Then the pqmq code could just change pqSendMethods to point to its own method structure instead of the default one. Would that address the concern this concern? It's more than 20 lines of code, but it's not TOO bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Sep 4, 2014 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote: Eh, maybe? I'm not sure why the case where we're using abbreviated keys should be different than the case we're not. In either case this is a straightforward trade-off: if we do a memcmp() before strcoll(), we win if it returns 0 and lose if returns non-zero and strcoll also returns non-zero. (If memcmp() returns non-zero but strcoll() returns 0, it's a tie.) I'm not immediately sure why it should affect the calculus one way or the other whether abbreviated keys are in use; the question of how much faster memcmp() is than strcoll() seems like the relevant consideration either way. Not quite. Consider my earlier example of sorting ~300,000 cities by country only. That's a pretty low cardinality attribute. We win big, and we are almost certain that the abbreviated key cardinality is a perfect proxy for the full key cardinality so we stick with abbreviated keys while copying over tuples. Sure, most comparisons will actually be resolved with a memcmp() == 0 rather than an abbreviated comparison, but under my ad-hoc cost model there is no distinction, since they're both very much cheaper than a strcoll() (particularly when we factor in the NUL termination copying that a memcmp() == 0 also avoids). To a lesser extent we're also justified in that optimism because we've already established that roughly the first 8 bytes of the string are bitwise equal. So the difference is that in the abbreviated key case, we are at least somewhat justified in our optimism. Whereas, where we're just eliding fmgr overhead, say on the 2nd or subsequent attribute of a multi-key sort, it's totally opportunistic to chance a memcmp() == 0. The latter optimization can only be justified by the fact that the memcmp() is somewhere between dirt cheap and free. That seems like soemthing that should significantly impact the calculus. Boiled down, what you're saying is that you might have a set that contains lots of duplicates in general, but not very many where the abbreviated-keys also match. Sure, that's true. But you might also not have that case, so I don't see where that gets us; the same worst-case test case Heikki developed the last time we relitigated this point is still relevant here. In order to know how much we're giving up in that case, we need the exact number I asked you to provide in my previous email: the ratio of the cost of strcoll() to the cost of memcmp(). I see that you haven't chosen to provide that information in any of your four responses. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-09-09 13:52:40 -0400, Robert Haas wrote: I had forgotten that it needed an update. Thanks for the reminder. Here's v2. I've attached a incremental patch fixing minor gripes. Other than that I think you can go ahead with this once the buildfarm accepts the sparc fixes (man, those machines are slow. spoonbill apparently takes ~5h for one run). I've done a read through s_lock.h and the only remaining potential issue that I see is that I have no idea if unixware's tas() is actually a safe compiler barrier (it is a memory barrier). And I really, really can't make myself care. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 38dc34d..e8d3502 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -159,6 +159,7 @@ void s_unlock(slock_t *lock) { #ifdef TAS_ACTIVE_WORD + /* HP's PA-RISC */ *TAS_ACTIVE_WORD(lock) = -1; #else *lock = 0; diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 8e0c4c3..5f62a2c 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -57,8 +57,8 @@ * * It is the responsibility of these macros to make sure that the compiler * does not re-order accesses to shared memory to precede the actual lock - * acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this - * was the caller's responsibility, which meant that callers had to use + * acquisition, or following the lock release. Prior to PostgreSQL 9.5, + * this was the caller's responsibility, which meant that callers had to use * volatile-qualified pointers to refer to both the spinlock itself and the * shared data being accessed within the spinlocked critical section. This * was notationally awkward, easy to forget (and thus error-prone), and @@ -403,7 +403,7 @@ tas(volatile slock_t *lock) * No stbar or membar available, luckily no actually produced hardware * requires a barrier. */ -#define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) +/* use the default S_UNLOCK definition for gcc */ #elif __sparcv8 /* stbar is available (and required for both PSO, RMO), membar isn't */ #define S_UNLOCK(lock) \ @@ -921,14 +921,14 @@ extern int tas_sema(volatile slock_t *lock); * store) after a following store; platforms where this is possible must * define their own S_UNLOCK. But CPU reordering is not the only concern: * if we simply defined S_UNLOCK() as an inline macro, the compiler might - * reorder instructions from the critical section to occur after the lock - * release. Since the compiler probably can't know what the external + * reorder instructions from inside the critical section to occur after the + * lock release. Since the compiler probably can't know what the external * function s_unlock is doing, putting the same logic there should be adequate. * A sufficiently-smart globally optimizing compiler could break that * assumption, though, and the cost of a function call for every spinlock * release may hurt performance significantly, so we use this implementation * only for platforms where we don't know of a suitable intrinsic. For the - * most part, those are relatively obscure platform/compiler platforms to + * most part, those are relatively obscure platform/compiler platforms to * which the PostgreSQL project does not have access. */ #define USE_DEFAULT_S_UNLOCK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Fri, Sep 5, 2014 at 10:45 PM, Peter Geoghegan p...@heroku.com wrote: While I gave serious consideration to your idea of having a dedicated abbreviation comparator, and not duplicating sortsupport state when abbreviated keys are used (going so far as to almost fully implement the idea), I ultimately decided that my vote says we don't do that. It seemed to me that there were negligible benefits for increased complexity. In particular, I didn't want to burden tuplesort with having to worry about whether or not abbreviation was aborted during tuple copying, or was not used by the opclass in the first place - implementing your scheme makes that distinction relevant. It's very convenient to have comparetup_heap() compare the leading sort key (that specifically looks at SortTuple.datum1 pairs) indifferently, using the same comparator for abbreviated and not abbreviated cases indifferently. comparetup_heap() does not seem like a great place to burden with caring about each combination any more than strictly necessary. I like that I don't have to care about every combination, and can treat abbreviation abortion as the special case with the extra step, in line with how I think of the optimization conceptually. Does that make sense? No. comparetup_heap() is hip-deep in this optimization as it stands, and what I proposed - if done correctly - isn't going to make that significantly worse. In fact, it really ought to make things better: you should be able to set things up so that ssup-comparator is always the test that should be applied first, regardless of whether we're aborted or not-aborted or not doing this in the first place; and then ssup-tiebreak_comparator, if not NULL, can be applied after that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Tue, Sep 9, 2014 at 5:09 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-09 13:52:40 -0400, Robert Haas wrote: I had forgotten that it needed an update. Thanks for the reminder. Here's v2. I've attached a incremental patch fixing minor gripes. Other than that I think you can go ahead with this once the buildfarm accepts the sparc fixes (man, those machines are slow. spoonbill apparently takes ~5h for one run). I've done a read through s_lock.h and the only remaining potential issue that I see is that I have no idea if unixware's tas() is actually a safe compiler barrier (it is a memory barrier). And I really, really can't make myself care. *It is the responsibility of these macros to make sure that the compiler *does not re-order accesses to shared memory to precede the actual lock - *acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this - *was the caller's responsibility, which meant that callers had to use + *acquisition, or following the lock release. Prior to PostgreSQL 9.5, + * this was the caller's responsibility, which meant that callers had to use AFAICS my version is right and your version is grammatically incorrect. re-order to proceed or follow uses the same verb tense in both branches of the or; re-order to proceed or following does not. I agree that if there are problems on UnixWare, we can let anyone who cares about UnixWare submit a patch to fix them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-09-09 17:30:44 -0400, Robert Haas wrote: On Tue, Sep 9, 2014 at 5:09 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-09 13:52:40 -0400, Robert Haas wrote: I had forgotten that it needed an update. Thanks for the reminder. Here's v2. I've attached a incremental patch fixing minor gripes. Other than that I think you can go ahead with this once the buildfarm accepts the sparc fixes (man, those machines are slow. spoonbill apparently takes ~5h for one run). I've done a read through s_lock.h and the only remaining potential issue that I see is that I have no idea if unixware's tas() is actually a safe compiler barrier (it is a memory barrier). And I really, really can't make myself care. *It is the responsibility of these macros to make sure that the compiler *does not re-order accesses to shared memory to precede the actual lock - *acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this - *was the caller's responsibility, which meant that callers had to use + *acquisition, or following the lock release. Prior to PostgreSQL 9.5, + * this was the caller's responsibility, which meant that callers had to use AFAICS my version is right and your version is grammatically incorrect. re-order to proceed or follow uses the same verb tense in both branches of the or; re-order to proceed or following does not. Wasn't sure about that one. It read oddly to me, but then I'm not a native speaker. And won't read the sentence often ;) I agree that if there are problems on UnixWare, we can let anyone who cares about UnixWare submit a patch to fix them. Ok. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On Tue, Sep 9, 2014 at 5:32 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-09 17:30:44 -0400, Robert Haas wrote: On Tue, Sep 9, 2014 at 5:09 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-09 13:52:40 -0400, Robert Haas wrote: I had forgotten that it needed an update. Thanks for the reminder. Here's v2. I've attached a incremental patch fixing minor gripes. Other than that I think you can go ahead with this once the buildfarm accepts the sparc fixes (man, those machines are slow. spoonbill apparently takes ~5h for one run). I've done a read through s_lock.h and the only remaining potential issue that I see is that I have no idea if unixware's tas() is actually a safe compiler barrier (it is a memory barrier). And I really, really can't make myself care. *It is the responsibility of these macros to make sure that the compiler *does not re-order accesses to shared memory to precede the actual lock - *acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this - *was the caller's responsibility, which meant that callers had to use + *acquisition, or following the lock release. Prior to PostgreSQL 9.5, + * this was the caller's responsibility, which meant that callers had to use AFAICS my version is right and your version is grammatically incorrect. re-order to proceed or follow uses the same verb tense in both branches of the or; re-order to proceed or following does not. Wasn't sure about that one. It read oddly to me, but then I'm not a native speaker. And won't read the sentence often ;) I agree that if there are problems on UnixWare, we can let anyone who cares about UnixWare submit a patch to fix them. Ok. So, that's committed, then. I think we should pick something that uses spinlocks and is likely to fail spectacularly if we haven't got this totally right yet, and de-volatilize it. And then watch to see what turns red in the buildfarm and/or which users start screaming. I'm inclined to propose lwlock.c as a candidate, since that's very widely used and a place where we know there's significant contention. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 2014-09-09 17:54:03 -0400, Robert Haas wrote: So, that's committed, then. Yay, finally. I think we should pick something that uses spinlocks and is likely to fail spectacularly if we haven't got this totally right yet, and de-volatilize it. And then watch to see what turns red in the buildfarm and/or which users start screaming. Good plan. I'm inclined to propose lwlock.c as a candidate, since that's very widely used and a place where we know there's significant contention. I suggest, additionally possibly, GetSnapshotData(). It's surely one of the hottest functions in postgres, and I've seen some performance increases from de-volatilizing it. IIRC it requires one volatile cast in one place to enforce that a variable is accessed only once. Not sure if we want to add such volatile casts or use something like linux' ACCESS_ONCE macros for some common types. Helps to grep for places worthy of inspection. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On 09/09/14 22:48, Robert Haas wrote: On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek p...@2ndquadrant.com wrote: I think that's completely wrong. As the patch series demonstrates, it's not limited to propagating ErrorResponse and NoticeResponse. It can also propagate NotifyResponse and RowDescription and DataRow and anything else that comes along. We are not just propagating errors; we are propagating all protocol messages of whatever type. So tying it to elog specifically is not right. Oh in that case, I think what Andres proposed is actually quite good. I know the hook works fine it just seems like using somewhat hackish solution to save 20 lines of code. If it's 20 lines of code, I'm probably fine to go that way. Let's see if we can figure out what those 20 lines look like. libpq.h exports 29 functions that do a variety of different things. Of those, 20 are in pqcomm.c and the others are in be-secure.c. I presume that pluggability for the latter group, if needed at all, is a separate project. The remaining ones break down like this: Ugh - The remaining functions are pq_init(), pq_comm_reset(), pq_flush(), pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(), pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout(). These are the ones that I think are potentially interesting. I didn't choose to provide hooks for all of these in the submitted patch because they're not all needed for I want to do here: pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol support, which did not interest me (nor did COPY, really); pq_putmessage_noblock(), pq_flush_if_writable(), and pq_is_send_pending() are only used for the walsender protocol, which doesn't seem useful to redirect to a non-socket; and I just didn't happen to have any use for pq_init() or pq_comm_reset(). Hence what I ended up with. But, I could revisit that. Suppose I provide a structure with 10 function pointers for all ten of those functions, or maybe 9 since pq_init() is called so early that it's not likely we'll have control to put the hooks in place before that point, and anyway whatever code installs the hooks can do its own initialization then. Then make a global variable like pqSendMethods and #define pq_comm_reset() to be pqSendMethods-comm_reset(), pflush() to be pqSendMethods-flush(), and so on for all 9 or 10 methods. Then the pqmq code could just change pqSendMethods to point to its own method structure instead of the default one. Would that address the concern this concern? It's more than 20 lines of code, but it's not TOO bad. Yes, although my issue with the hooks was not that you only provided them for 2 functions, but the fact that it had no structure and the implementation was if hook set do this, else do that which I don't see like a good way of doing it. All I personally want is structure and extensibility so struct with just the needed functions is good enough for me and I would be ok to leave the other 8 functions just as a option for whomever needs to make them overridable in the future. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]
On 2014-09-09 22:22:45 +0200, Andres Freund wrote: I plan to push this soon. Done. Thanks for the patch! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FD_SETSIZE on Linux?
Hi, I noticed when trying to set pgbench's client count to a high number, I had to reduce it, and I found the maximum I can get away with is 1014. Any higher and I get: invalid number of clients: 1015 I find this in pgbench.c: #ifdef FD_SETSIZE #define MAXCLIENTS (FD_SETSIZE - 10) #else #define MAXCLIENTS 1024 #endif And FS_SETSIZE defined before it: #ifdef WIN32 #define FD_SETSIZE 1024 /* set before winsock2.h is included */ #endif /* ! WIN32 */ ... but apparently only if using Windows, which I'm not. So it appears that MAXCLIENTS is being set to 1014 (1024 - 10), which looks like should only be the case on Windows. I'm a bit confused here. Shouldn't my MAXCLIENTS be set to 1024? Thom
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On 9.9.2014 16:09, Robert Haas wrote: On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra t...@fuzzy.cz wrote: So I only posted the separate patch for those who want to do a review, and then a complete patch with both parts combined. But it sure may be a bit confusing. Let's do this: post each new version of the patches only on this thread, as a series of patches that can be applied one after another. OK, attached. Apply in this order 1) dense-alloc-v5.patch 2) hashjoin-nbuckets-v12.patch In ExecChooseHashTableSize(), if buckets_bytes is independent of nbatch, could we just compute it before working out dbatch, and just deduct it from hash_table_bytes? That seems like it'd do the same thing for less work. Well, we can do that. But I really don't think that's going to make measurable difference. And I think the code is more clear this way. Really? It seems to me that you're doing more or less the same calculation that's already been done a second time. It took me 20 minutes of staring to figure out what it was really doing. Now admittedly I was a little low on caffeine, but... I reworked this part a bit, to make it easier to understand. Mostly following your recommendations. Either way, what happens if buckets_bytes is already bigger than hash_table_bytes? Hm, I don't see how that could happen. How about an Assert() and a comment, then? Done. According to the comment, we should never really get over 25%, except maybe for strange work_mem values, because we're keeping nbuckets as 2^N. But even then we should not reach 50%, so I added this assert: Assert(buckets_bytes = hash_table_bytes/2); And then we subtract the buckets_bytes, and continue with the loop. A few more stylistic issues that I see: + if ((hashtable-nbatch == 1) + if (hashtable-nbuckets_optimal = (INT_MAX/2)) + if (size (HASH_CHUNK_SIZE/8)) While I'm all in favor of using parentheses generously where it's needed to add clarity, these and similar usages seem over the top to me. It seemed more readable for me at the time of coding it, and it still seems better this way, but ... https://www.youtube.com/watch?v=CxK_nA2iVXw Heh. Well, at any rate, I think PostgreSQL style is to not use parentheses in that situation. FWIW, I added HASH_CHUNK_THRESHOLD macro, specifying the threshold. I also simplified the conditions a bit, and removed some of the parentheses. +char * chunk_alloc(HashJoinTable hashtable, int size) { Eh... no. + /* XXX maybe we should use MAXALIGN(size) here ... */ +1. I'm not sure what's the 'no' pointing at? Maybe that the parenthesis should be on the next line? And is the +1 about doing MAXALING? The no is about the fact that you have the return type, the function name, and the opening brace on one line instead of three lines as is project style. The +1 is for applying MAXALIGN to the size. OK, fixed. I also did a few 'minor' changes to the dense allocation patch, most notably: * renamed HashChunk/HashChunkData to MemoryChunk/MemoryChunkData The original naming seemed a bit awkward. * renamed the function to 'dense_alloc' (instead of 'chunk_alloc') Seems like a better fit to what the function does. * the chunks size is 32kB (instead of 16kB), and we're using 1/4 threshold for 'oversized' items We need the threshold to be =8kB, to trigger the special case within AllocSet. The 1/4 rule is consistent with ALLOC_CHUNK_FRACTION. I also considered Heikki's suggestion that while rebatching, we can reuse chunks with a single large tuple, instead of copying it needlessly. My attempt however made the code much uglier (I almost used a GOTO, but my hands rejected to type it ...). I'll look into that. Tomas diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 589b2f1..d5acfb1 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -47,6 +47,7 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable, int bucketNumber); static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable); +static char * dense_alloc(HashJoinTable hashtable, int tupleSize); /* * ExecHash @@ -294,6 +295,8 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls) hashtable-spaceAllowedSkew = hashtable-spaceAllowed * SKEW_WORK_MEM_PERCENT / 100; + hashtable-chunks = NULL; + /* * Get info about the hash functions to be used for each hash key. Also * remember whether the join operators are strict. @@ -556,10 +559,10 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) int oldnbatch = hashtable-nbatch; int curbatch = hashtable-curbatch; int nbatch; - int i; MemoryContext oldcxt; long ninmemory; long nfreed; + MemoryChunk oldchunks = hashtable-chunks; /* do nothing if we've decided to shut off growth */ if
Re: [HACKERS] FD_SETSIZE on Linux?
Thom Brown t...@linux.com writes: I find this in pgbench.c: #ifdef FD_SETSIZE #define MAXCLIENTS (FD_SETSIZE - 10) #else #define MAXCLIENTS 1024 #endif FD_SETSIZE is supposed to be defined, according to the POSIX spec: The sys/select.h header shall define the following symbolic constant, which shall have a value suitable for use in #if preprocessing directives: FD_SETSIZE Maximum number of file descriptors in an fd_set structure. It looks like Linux sets it to 1024. On RHEL6, at least, I find this: $ grep -r FD_SETSIZE /usr/include /usr/include/linux/posix_types.h:#undef __FD_SETSIZE /usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024 ... /usr/include/sys/select.h:#define FD_SETSIZE __FD_SETSIZE ... #ifdef WIN32 #define FD_SETSIZE 1024 /* set before winsock2.h is included */ #endif /* ! WIN32 */ Windows probably hasn't got sys/select.h at all, so it may not provide this symbol. Interestingly, it looks like POSIX also requires sys/time.h to define FD_SETSIZE. I wonder whether Windows has that header? It'd definitely be better to get this symbol from the system than assume 1024 will work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hmm. Sorry, I misunderstood the specification. You approach that coloring tokens seems right, but you have broken the parse logic by adding your code. Other than the mistakes others pointed, I found that - non-SQL-ident like tokens are ignored by their token style, quoted or not, so the following line works. | local All aLL trust I suppose this is not what you intended. This is because you have igonred the attribute of a token when comparing it as non-SQL-ident tokens. - '+' at the head of the sequence '+' is treated as the first character of the *quoted* string. e.g. +hoge is tokenized as +hoge:special_quoted. I found this is what intended. This should be documented as comments. |2) users and user-groups only requires special handling and behavior as follows |Normal user : | A. unquoted ( USER ) will be treated as user ( downcase ). | B. quoted ( USeR ) will be treated as USeR (case-sensitive). | C. quoted ( +USER ) will be treated as normal user +USER (i.e. will not be considered as user-group) and case-sensitive as string is quoted. This seems confising with the B below. This seems should be rearranged. | User Group : | A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ). | B. plus quoted ( +UserGROUP ) will be treated as +UserGROUP (case-sensitive). This is why you simply continued processing for '+' without discarding and skipping the '+', and not setting in_quote so the following parser code works as it is not intended. You should understand what the original code does and insert or modify logics not braeking the assumptions. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 05/09/14 23:50, Amit Kapila wrote: On Fri, Sep 5, 2014 at 8:42 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: On 04/09/14 14:42, Amit Kapila wrote: On Thu, Sep 4, 2014 at 8:00 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz wrote: Hi Amit, Results look pretty good. Does it help in the read-write case too? Last time I ran the tpc-b test of pgbench (results of which are posted earlier in this thread), there doesn't seem to be any major gain for that, however for cases where read is predominant, you might see better gains. I am again planing to take that data in next few days. FWIW below are some test results on the 60 core beast with this patch applied to 9.4. I'll need to do more runs to iron out the variation, but it looks like the patch helps the standard (write heavy) pgbench workload a little, and clearly helps the read only case. Thanks for doing the test. I think if possible you can take the performance data with higher scale factor (4000) as it seems your m/c has 1TB of RAM. You might want to use latest patch I have posted today. Here's some fairly typical data from read-write and read-only runs at scale 4000 for 9.4 beta2 with and without the v7 patch (below). I'm not seeing much variation between repeated read-write runs with the same config (which is nice - sleep 30 and explicit checkpoint call between each one seem to help there). Interestingly, I note anecdotally that (unpatched) 9.4 beta2 seems to be better at higher client counts than beta1 was... In terms of the effect of the patch - looks pretty similar to the scale 2000 results for read-write, but read-only is a different and more interesting story - unpatched 9.4 is noticeably impacted in the higher client counts, whereas the patched version scales as well (or even better perhaps) than in the scale 2000 case. read write (600s) Clients | tps| tps (unpatched) -++ 6 | 9395 | 9334 12 | 16605 | 16525 24 | 24634 | 24910 48 | 32170 | 31275 96 | 35675 | 36533 192 | 35579 | 31137 384 | 30528 | 28308 read only (300s) Clients | tps| tps (unpatched) -++ 6 | 35743 | 35362 12 | 111019 | 106579 24 | 199746 | 160305 48 | 327026 | 198407 96 | 379184 | 171863 192 | 356623 | 152224 384 | 340878 | 128308 regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and epoch
On Tue, Sep 9, 2014 at 4:05 PM, Bruce Momjian br...@momjian.us wrote: Yes, I did think about that, but it seems like a behavior change. However, it is tempting to avoid future bug reports about this. When this came up in March, Tom and I agreed that this wasn't something we wanted to slip into 9.4. Given that, it is hard to argue we should now slip this into 9.5, 9.4, and 9.3, so unless someone else votes for inclusion, I think I will leave this as 9.5-only. With no one replying, I will consider this issue closed and not backpatch this. I think the reason nobody's responding is because nobody has anything significant to add. It's a behaviour change from not-working to working. Why wouldn't it be backpatched? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMIT for UPDATE and DELETE
(2014/09/09 18:57), Heikki Linnakangas wrote: On 09/03/2014 06:39 PM, Robert Haas wrote: Now some people might argue that we have that anyway, but the fact is that a lot of people are using inheritance today, even with all its flaws, and they wouldn't be if there were a long laundry list of limitations that didn't apply to regular tables. We should be looking to lift the limitations that currently exist, not add more. I agree. If we are to support UPDATE .. ORDER BY .. LIMIT, it should work with inheritance. So the path forward is (using Marko's phrasing upthread): 1) We put the LIMIT inside ModifyTable like this patch does. This doesn't prevent us from doing ORDER BY in the future, but helps numerous people who today have to 2) Someone rewrites how UPDATE works based on Tom's suggestion here: http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us, which allows us to support ORDER BY on all tables (or perhaps maybe not FDWs, I don't know how those work). The LIMIT functionality in this patch is unaffected. What's not clear to me is whether it make sense to do 1) without 2) ? Is UPDATE .. LIMIT without support for an ORDER BY useful enough? And if we apply this patch now, how much of it needs to be rewritten after 2) ? If the answers are yes and not much, then we should review this patch now, and put 2) on the TODO list. Otherwise 2) should do done first. My answers are yes but completely rewritten. So, I think we should work on 2) first ideally, but 2) seems a large project at least to me. So, I think it would be reasonable to implement UPDATE/DELETE .. LIMIT based on Rukh' patch for now and put 2) and the re-implementation of 1) after 2) on the TODO list. I don't have enough time to review it for a while, so I'd like to work on it (and postgres_fdw's update pushdown enhancement related to it) at the next CF (I think I can do it earlier). I must apologize to Rukh for not having enough time for the patch review. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/09/09 22:17), Fujii Masao wrote: On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Attached is the updated version of the patch. Thank you for updating the patch! I took a quick review on the patch. It looks good to me, but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? Sorry for the delay. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/09 22:17), Fujii Masao wrote: On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Attached is the updated version of the patch. Thank you for updating the patch! I took a quick review on the patch. It looks good to me, Thanks for reviewing the patch! but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, Yep. PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Sorry for the delay. No problem. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 9, 2014 at 3:46 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 5, 2014 at 9:19 AM, Amit Kapila amit.kapil...@gmail.com wrote: One regression failed on linux due to spacing issue which is fixed in attached patch. I just read the latest patch by curiosity, wouldn't it make more sense to split the patch into two different patches for clarity: one for the reclaimer worker centralized around BgMoveBuffersToFreelist and a second for the pg_stat portion? Those seem two different features. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On Mon, Sep 8, 2014 at 8:50 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Sep 3, 2014 at 11:40 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Aug 31, 2014 at 9:45 AM, Magnus Hagander mag...@hagander.net wrote: Do we really want those Asserts? There is not a single Assert in bin/pg_basebackup today - as is the case for most things in bin/. We typically use regular if statements for things that can happen, and just ignore the others I think - since the callers are fairly simple to trace. I have no opinion on whether we want these particular Assert() calls, but I note that using Assert() in front-end code only became possible in February of 2013, as a result of commit e1d25de35a2b1f809e8f8d7b182ce0af004f3ec9. So the lack of assertions there may not be so much because people thought it was a bad idea as that it didn't use to work. Generally, I favor the use of Assert() in front-end code in the same scenarios in which we use it in back-end code: for checks that shouldn't burden production builds but are useful during development. Well that was exactly why they have been added first. The assertions have been placed in some functions to check for incompatible combinations of argument values when those functions are called. I don't mind re-adding them if people agree that they make sense. IMO they do and they help as well for the development of future utilities using replication protocol in src/bin/pg_basebackup as much as the refactoring work done on this thread. Let's keep them then. We probably need to adjust the error message as well in that case, because it's no longer what's expected, it's what's required? OK, changed this way. The reason for the formulation of the current error message is that it's the same across all callsites emitting it to make it easier for translators. It used to be more specific at some point and then was changed. Since these aren't expected to be hit much I don't really see a need to be very detailed. Re-changed back to that. Hm. I'd vote to simplify the code a bit based on the argument that the current API only looks at the 3 first columns and does not care about the 4th which is the plugin name. Why not return all four columns from RunIdentifySystem(), setting plugin to NULL if not available. Then the caller can error out. This seems the cleaner approach, do did so. New patches taking into account all those comments are attached. Thanks for the feedback. Regards, -- Michael From 6a864d1ef18a01e6680fceab181b78d266063200 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 1 Sep 2014 20:48:43 +0900 Subject: [PATCH 1/2] Refactoring of pg_basebackup utilities Code duplication is reduced with the introduction of new APIs for each individual replication command: - IDENTIFY_SYSTEM - CREATE_REPLICATION_SLOT - DROP_REPLICATION_SLOT A couple of variables used to identify a timeline ID are changed as well to be more consistent with core code. --- src/bin/pg_basebackup/pg_basebackup.c | 21 + src/bin/pg_basebackup/pg_receivexlog.c | 49 +++ src/bin/pg_basebackup/pg_recvlogical.c | 132 +++- src/bin/pg_basebackup/streamutil.c | 153 + src/bin/pg_basebackup/streamutil.h | 10 +++ 5 files changed, 208 insertions(+), 157 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 8b9acea..0ebda9a 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -1569,8 +1569,8 @@ BaseBackup(void) { PGresult *res; char *sysidentifier; - uint32 latesttli; - uint32 starttli; + TimeLineID latesttli; + TimeLineID starttli; char *basebkp; char escaped_label[MAXPGPATH]; char *maxrate_clause = NULL; @@ -1624,23 +1624,8 @@ BaseBackup(void) /* * Run IDENTIFY_SYSTEM so we can get the timeline */ - res = PQexec(conn, IDENTIFY_SYSTEM); - if (PQresultStatus(res) != PGRES_TUPLES_OK) - { - fprintf(stderr, _(%s: could not send replication command \%s\: %s), -progname, IDENTIFY_SYSTEM, PQerrorMessage(conn)); + if (!RunIdentifySystem(conn, sysidentifier, latesttli, NULL, NULL)) disconnect_and_exit(1); - } - if (PQntuples(res) != 1 || PQnfields(res) 3) - { - fprintf(stderr, -_(%s: could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields\n), -progname, PQntuples(res), PQnfields(res), 1, 3); - disconnect_and_exit(1); - } - sysidentifier = pg_strdup(PQgetvalue(res, 0, 0)); - latesttli = atoi(PQgetvalue(res, 0, 1)); - PQclear(res); /* * Start the actual backup diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index a8b9ad3..cd7a41b 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -253,21 +253,10 @@ FindStreamingStart(uint32 *tli) static void StreamLog(void) { - PGresult *res;
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Sep 10, 2014 at 2:18 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 9, 2014 at 1:18 PM, Petr Jelinek p...@2ndquadrant.com wrote: I think that's completely wrong. As the patch series demonstrates, it's not limited to propagating ErrorResponse and NoticeResponse. It can also propagate NotifyResponse and RowDescription and DataRow and anything else that comes along. We are not just propagating errors; we are propagating all protocol messages of whatever type. So tying it to elog specifically is not right. Oh in that case, I think what Andres proposed is actually quite good. I know the hook works fine it just seems like using somewhat hackish solution to save 20 lines of code. If it's 20 lines of code, I'm probably fine to go that way. Let's see if we can figure out what those 20 lines look like. libpq.h exports 29 functions that do a variety of different things. Of those, 20 are in pqcomm.c and the others are in be-secure.c. I presume that pluggability for the latter group, if needed at all, is a separate project. The remaining ones break down like this: - StreamServerPort, StreamConnection, StreamClose, and TouchSocketFiles are intended to be called only from the postmaster, to set up and tear down the listening socket and individual connections. Presumably this is not what we care about here. - pq_getbytes(), pq_getstring(), pq_getmessage(), pq_getbyte(), pq_peekbyte(), and pq_getbyte_if_available() handle *reading* data from the socket. Since you previously agreed that we didn't need to build two-way communication on top of this, I would thank that would mean that these don't need to be pluggable either. But maybe I'm wrong. - The remaining functions are pq_init(), pq_comm_reset(), pq_flush(), pq_flush_if_writable(), pq_is_send_pending(), pq_putmessage(), pq_putmessage_noblock(), pq_startcopyout(), and pq_endcopyout(). These are the ones that I think are potentially interesting. I didn't choose to provide hooks for all of these in the submitted patch because they're not all needed for I want to do here: pq_startcopyout() and pq_endcopyout() are only needed for V2 protocol support, which did not interest me (nor did COPY, really); pq_putmessage_noblock(), pq_flush_if_writable(), and pq_is_send_pending() are only used for the walsender protocol, which doesn't seem useful to redirect to a non-socket; and I just didn't happen to have any use for pq_init() or pq_comm_reset(). Hence what I ended up with. But, I could revisit that. Suppose I provide a structure with 10 function pointers for all ten of those functions, or maybe 9 since pq_init() is called so early that it's not likely we'll have control to put the hooks in place before that point, and anyway whatever code installs the hooks can do its own initialization then. Can we use pq_init() to install function pointers? Let us say that it will take COMM_METHOD (tcp, shm_mq) as input and then install function pointers based on communication method. We can call this from main function of bgworker (in case of patch from pg_background_worker_main()) with COMM_METHOD as shm_mq and BackendInitialize() will pass it as tcp. Then make a global variable like pqSendMethods and #define pq_comm_reset() to be pqSendMethods-comm_reset(), pflush() to be pqSendMethods-flush(), and so on for all 9 or 10 methods. Then the pqmq code could just change pqSendMethods to point to its own method structure instead of the default one. Would that address the concern this concern? It's more than 20 lines of code, but it's not TOO bad. This idea seems to be better than directly using hooks, however I don't see any harm in defining pqReceiveMethods for get API's as well because it can make the whole layer extendable. Having said that I think as currently there is no usage of it, so we can leave it for another patch as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com