Re: [HACKERS] Escaping from blocked send() reprised.
Sorry, I missed this message and only cought up when reading your CF status mail. I've attached three patches: Could let me know how to get the CF status mail? I think he meant this email I sent last weekend: http://www.postgresql.org/message-id/542672d2.3060...@vmware.com I see, that's what I also received. Thank you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo fixes in src/backend/rewrite/rewriteHandler.c
Here is the comments in process_matched_tle() in rewriteHandler.c. 883 * such nodes; consider 884 * UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y 885 * The two expressions produced by the parser will look like 886 * FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)) 887 * FieldStore(col, fld2, FieldStore(placeholder, subfld2, x)) I think the second one is not correct and should be FieldStore(col, fld2, FieldStore(placeholder, subfld2, y)) Just like this, 891 * FieldStore(FieldStore(col, fld1, 892 *FieldStore(placeholder, subfld1, x)), 893 * fld2, FieldStore(placeholder, subfld2, x)) should be FieldStore(FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)), fld2, FieldStore(placeholder, subfld2, y)) Patch attached. Thanks, Best regards, Etsuro Fujita diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index cb65c05..93fda07 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -883,13 +883,13 @@ process_matched_tle(TargetEntry *src_tle, * UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y * The two expressions produced by the parser will look like * FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)) - * FieldStore(col, fld2, FieldStore(placeholder, subfld2, x)) + * FieldStore(col, fld2, FieldStore(placeholder, subfld2, y)) * However, we can ignore the substructure and just consider the top * FieldStore or ArrayRef from each assignment, because it works to * combine these as * FieldStore(FieldStore(col, fld1, * FieldStore(placeholder, subfld1, x)), - * fld2, FieldStore(placeholder, subfld2, x)) + * fld2, FieldStore(placeholder, subfld2, y)) * Note the leftmost expression goes on the inside so that the * assignments appear to occur left-to-right. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On 10/02/2014 02:52 AM, Peter Geoghegan wrote: Having been surprisingly successful at advancing our understanding of arguments for and against various approaches to value locking, I decided to try the same thing out elsewhere. I have created a general-purpose UPSERT wiki page. The page is: https://wiki.postgresql.org/wiki/UPSERT Thanks! Could you write down all of the discussed syntaxes, using a similar notation we use in the manual, with examples on how to use them? And some examples on what is possible with some syntaxes, and not with others? That would make it a lot easier to compare them. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On 10/02/2014 03:20 AM, Kevin Grittner wrote: My only concern from the benchmarks is that it seemed like there was a statistically significant increase in planning time: unpatched plan time average: 0.450 ms patched plan time average: 0.536 ms That *might* just be noise, but it seems likely to be real. For the improvement in run time, I'd put up with an extra 86us in planning time per hash join; but if there's any way to shave some of that off, all the better. The patch doesn't modify the planner at all, so it would be rather surprising if it increased planning time. I'm willing to just write that off as noise. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Dne 2 Říjen 2014, 2:20, Kevin Grittner napsal(a): Tomas Vondra t...@fuzzy.cz wrote: On 12.9.2014 23:22, Robert Haas wrote: My first thought is to revert to NTUP_PER_BUCKET=1, but it's certainly arguable. Either method, though, figures to be better than doing nothing, so let's do something. OK, but can we commit the remaining part first? Because it'll certainly interact with this proposed part, and it's easier to tweak when the code is already there. Instead of rebasing over and over. The patch applied and built without problem, and pass `make check-world`. The only thing that caught my eye was the addition of debug code using printf() instead of logging at a DEBUG level. Is there any reason for that? Not really. IIRC the main reason it that the other code in nodeHash.c uses the same approach. I still need to work through the (rather voluminous) email threads and the code to be totally comfortable with all the issues, but if Robert and Heikki are comfortable with it, I'm not gonna argue. ;-) Preliminary benchmarks look outstanding! I tried to control carefully to ensure consistent results (by dropping, recreating, vacuuming, analyzing, and checkpointing before each test), but there were surprising outliers in the unpatched version. It turned out that it was because slight differences in the random samples caused different numbers of buckets for both unpatched and patched, but the patched version dealt with the difference gracefully while the unpatched version's performance fluctuated randomly. Can we get a bit more details on the test cases? I did a number of tests while working on the patch, and I generally tested two cases: (a) well-estimated queries (i.e. with nbucket matching NTUP_PER_BUCKET) (b) mis-estimated queries, with various levels of accuracy (say, 10x - 1000x misestimates) From the description, it seems you only tested (b) - is that correct? The thing is that while the resize is very fast and happens only once, it's not perfectly free. In my tests, this was always more than compensated by the speedups (even in the weird cases reported by Stephen Frost), so I think we're safe here. Also, I definitely recommend testing with different hash table sizes (say, work_mem=256MB and the hash table just enough to fit in without batching). The thing is the effect of CPU caches is very different for small and large hash tables. (This is not about work_mem alone, but about how much memory is used by the hash table - according to the results you posted it never gets over ~4.5MB) You tested against current HEAD, right? This patch was split into three parts, two of which were already commited (45f6240a and 8cce08f1). The logic of the patch was this might take a of time/memory, but it's compensated by these other changes. Maybe running the tests on the original code would be interesting? Although, if this last part of the patch actually improves the performance on it's own, we're fine - it'll improve it even more compared to the old code (especially before lowering NTUP_PER_BUCKET 10 to 1). My only concern from the benchmarks is that it seemed like there was a statistically significant increase in planning time: unpatched plan time average: 0.450 ms patched plan time average: 0.536 ms That *might* just be noise, but it seems likely to be real. For the improvement in run time, I'd put up with an extra 86us in planning time per hash join; but if there's any way to shave some of that off, all the better. I agree with Heikki that this is probably noise, because the patch does not mess with planner at all. The only thing I can think of is adding a few fields into HashJoinTableData. Maybe this makes the structure too large to fit into a cacheline, or something? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Thu, Oct 2, 2014 at 12:07 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Could you write down all of the discussed syntaxes, using a similar notation we use in the manual, with examples on how to use them? And some examples on what is possible with some syntaxes, and not with others? That would make it a lot easier to compare them. I've started off by adding varied examples of the use of the existing proposed syntax. I'll expand on this soon. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hello, I propose the attached patch. It adds a new flag ImmediateDieOK, which is a weaker form of ImmediateInterruptOK that only allows handling a pending die-signal in the signal handler. Robert, others, do you see a problem with this? Per se I don't have a problem with it. There does exist the problem that the user doesn't get a error message in more cases though. On the other hand it's bad if any user can prevent the database from restarting. Over IM, Robert pointed out that it's not safe to jump out of a signal handler with siglongjmp, when we're inside library calls, like in a callback called by OpenSSL. But even with current master branch, that's exactly what we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means that any incoming signal will be handled directly in the signal handler, which can mean elog(ERROR). Should we be worried? OpenSSL might get confused if control never returns to the SSL_read() or SSL_write() function that called secure_raw_read(). But this is imo prohibitive. Yes, we're doing it for a long while. But no, that's not ok. It actually prompoted me into prototyping the latch thing (in some other thread). I don't think existing practice justifies expanding it further. I see, in that case, this approach seems basically applicable. But if I understand correctly, this patch seems not to return out of the openssl code even when latch was found to be set in secure_raw_write/read. I tried setting errno = ECONNRESET and it went well but seems a bad deed. secure_raw_write(Port *port, const void *ptr, size_t len) { n = send(port-sock, ptr, len, 0); if (!port-noblock n 0 (errno == EWOULDBLOCK || errno == EAGAIN)) { w = WaitLatchOrSocket(MyProc-procLatch, ... if (w WL_LATCH_SET) { ResetLatch(MyProc-procLatch); /* * Force a return, so interrupts can be processed when not * (possibly) underneath a ssl library. */ errno = EINTR; (return n; // n is negative) my_sock_write(BIO *h, const char *buf, int size) { res = secure_raw_write(((Port *) h-ptr), buf, size); BIO_clear_retry_flags(h); if (res = 0) { if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) { BIO_set_retry_write(h); -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 3
On 09/23/2014 09:24 PM, Andres Freund wrote: I've previously started two threads about replication identifiers. Check http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de and http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de . The've also been discussed in the course of another thread: http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de And even earlier here: http://www.postgresql.org/message-id/flat/1339586927-13156-10-git-send-email-and...@2ndquadrant.com#1339586927-13156-10-git-send-email-and...@2ndquadrant.com The thread branched a lot, the relevant branch is the one with subject [PATCH 10/16] Introduce the concept that wal has a 'origin' node == Identify the origin of changes == Say you're building a replication solution that allows two nodes to insert into the same table on two nodes. Ignoring conflict resolution and similar fun, one needs to prevent the same change being replayed over and over. In logical replication the changes to the heap have to be WAL logged, and thus the *replay* of changes from a remote node produce WAL which then will be decoded again. To avoid that it's very useful to tag individual changes/transactions with their 'origin'. I.e. mark changes that have been directly triggered by the user sending SQL as originating 'locally' and changes originating from replaying another node's changes as originating somewhere else. If that origin is exposed to logical decoding output plugins they can easily check whether to stream out the changes/transactions or not. It is possible to do this by adding extra columns to every table and store the origin of a row in there, but that a) permanently needs storage b) makes things much more invasive. An origin column in the table itself helps tremendously to debug issues with the replication system. In many if not most scenarios, I think you'd want to have that extra column, even if it's not strictly required. What I've previously suggested (and which works well in BDR) is to add the internal id to the XLogRecord struct. There's 2 free bytes of padding that can be used for that purpose. Adding a field to XLogRecord for this feels wrong. This is for *logical* replication - why do you need to mess with something as physical as the WAL record format? And who's to say that a node ID is the most useful piece of information for a replication system to add to the WAL header. I can easily imagine that you'd want to put a changeset ID or something else in there, instead. (I mentioned another example of this in http://www.postgresql.org/message-id/4fe17043.60...@enterprisedb.com) If we need additional information added to WAL records, for extensions, then that should be made in an extensible fashion. IIRC (I couldn't find a link right now), when we discussed the changes to heap_insert et al for wal_level=logical, I already argued back then that we should make it possible for extensions to annotate WAL records, with things like this is the primary key, or whatever information is needed for conflict resolution, or handling loops. I don't like it that we're adding little pieces of information to the WAL format, bit by bit. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote: Hello, I propose the attached patch. It adds a new flag ImmediateDieOK, which is a weaker form of ImmediateInterruptOK that only allows handling a pending die-signal in the signal handler. Robert, others, do you see a problem with this? Per se I don't have a problem with it. There does exist the problem that the user doesn't get a error message in more cases though. On the other hand it's bad if any user can prevent the database from restarting. Over IM, Robert pointed out that it's not safe to jump out of a signal handler with siglongjmp, when we're inside library calls, like in a callback called by OpenSSL. But even with current master branch, that's exactly what we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means that any incoming signal will be handled directly in the signal handler, which can mean elog(ERROR). Should we be worried? OpenSSL might get confused if control never returns to the SSL_read() or SSL_write() function that called secure_raw_read(). But this is imo prohibitive. Yes, we're doing it for a long while. But no, that's not ok. It actually prompoted me into prototyping the latch thing (in some other thread). I don't think existing practice justifies expanding it further. I see, in that case, this approach seems basically applicable. But if I understand correctly, this patch seems not to return out of the openssl code even when latch was found to be set in secure_raw_write/read. Correct. That's why I think it's the way forward. There's several problems now where the inability to do real things while reading/writing is a problem. I tried setting errno = ECONNRESET and it went well but seems a bad deed. Where and why did you do that? secure_raw_write(Port *port, const void *ptr, size_t len) { n = send(port-sock, ptr, len, 0); if (!port-noblock n 0 (errno == EWOULDBLOCK || errno == EAGAIN)) { w = WaitLatchOrSocket(MyProc-procLatch, ... if (w WL_LATCH_SET) { ResetLatch(MyProc-procLatch); /* * Force a return, so interrupts can be processed when not * (possibly) underneath a ssl library. */ errno = EINTR; (return n; // n is negative) my_sock_write(BIO *h, const char *buf, int size) { res = secure_raw_write(((Port *) h-ptr), buf, size); BIO_clear_retry_flags(h); if (res = 0) { if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) { BIO_set_retry_write(h); Hm, this seems, besides one comment, the code from the last patch in my series. Do you have a particular question about it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 3
On 2014-10-02 11:49:31 +0300, Heikki Linnakangas wrote: On 09/23/2014 09:24 PM, Andres Freund wrote: I've previously started two threads about replication identifiers. Check http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de and http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de . The've also been discussed in the course of another thread: http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de And even earlier here: http://www.postgresql.org/message-id/flat/1339586927-13156-10-git-send-email-and...@2ndquadrant.com#1339586927-13156-10-git-send-email-and...@2ndquadrant.com The thread branched a lot, the relevant branch is the one with subject [PATCH 10/16] Introduce the concept that wal has a 'origin' node Right. Long time ago already ;) == Identify the origin of changes == Say you're building a replication solution that allows two nodes to insert into the same table on two nodes. Ignoring conflict resolution and similar fun, one needs to prevent the same change being replayed over and over. In logical replication the changes to the heap have to be WAL logged, and thus the *replay* of changes from a remote node produce WAL which then will be decoded again. To avoid that it's very useful to tag individual changes/transactions with their 'origin'. I.e. mark changes that have been directly triggered by the user sending SQL as originating 'locally' and changes originating from replaying another node's changes as originating somewhere else. If that origin is exposed to logical decoding output plugins they can easily check whether to stream out the changes/transactions or not. It is possible to do this by adding extra columns to every table and store the origin of a row in there, but that a) permanently needs storage b) makes things much more invasive. An origin column in the table itself helps tremendously to debug issues with the replication system. In many if not most scenarios, I think you'd want to have that extra column, even if it's not strictly required. I don't think you'll have much success convincing actual customers of that. It's one thing to increase the size of the WAL stream a bit, it's entirely different to persistently increase the table size of all their tables. What I've previously suggested (and which works well in BDR) is to add the internal id to the XLogRecord struct. There's 2 free bytes of padding that can be used for that purpose. Adding a field to XLogRecord for this feels wrong. This is for *logical* replication - why do you need to mess with something as physical as the WAL record format? XLogRecord isn't all that physical. It doesn't encode anything in that regard but the fact that there's backup blocks in the record. It's essentially just an implementation detail of logging. Whether that's physical or logical doesn't really matter much. There's basically two primary reasons I think it's a good idea to add it there: a) There's many different type of records where it's useful to add the origin. Adding the information to all these will make things more complicated, using more space, and be more fragile. And I'm pretty sure that the number of things people will want to expose over logical replication will increase. I know of at least two things that have at least some working code: Exposing 2PC to logical decoding to allow optionally synchronous replication, and allowing to send transactional/nontransactional 'messages' via the WAL without writing to a table. Now, we could add a framework to attach general information to every record - but I have a very hard time seing how this will be of comparable complexity *and* efficiency. b) It's dead simple with a pretty darn low cost. Both from a runtime as well as a maintenance perspective. c) There needs to be crash recovery interation anyway to compute the state of how far replication succeeded before crashing. So it's not like we could make this completely extensible without core code knowing. And who's to say that a node ID is the most useful piece of information for a replication system to add to the WAL header. I can easily imagine that you'd want to put a changeset ID or something else in there, instead. (I mentioned another example of this in http://www.postgresql.org/message-id/4fe17043.60...@enterprisedb.com) I'm onboard with adding a extensible facility to attach data to successful transactions. There've been at least two people asking me directly about how to e.g. attach user information to transactions. I don't think that's equivalent with what I'm talking about here though. One important thing about this proposal is that it allows to completely skip (nearly, except cache inval) all records with a uninteresting origin id *before* decoding them. Without having to keep any per transaction state about 'uninteresting'
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On 2014-10-01 18:19:05 +0200, Ilya Kosmodemiansky wrote: I have a patch which is actually not commitfest-ready now, but it always better to start discussing proof of concept having some patch instead of just an idea. That's a good way to start work on a topic like this. From an Oracle DBA's point of view, currently we have a lack of performance diagnostics tools. Not just from a oracle DBA POV ;). Generally. So I'm happy to see some focus on this! Saying that, principally they mean an Oracle Wait Interface analogue. The Basic idea is to have counters or sensors all around database kernel to measure what a particular backend is currently waiting for and how long/how often it waits. Yes, I can see that. I'm not sure whether lwlocks are the primary point I'd start with though. In many cases you'll wait on so called 'heavyweight' locks too... Suppose we have a PostgreSQL instance under heavy write workload, but we do not know any details. We could pull from time to time pg_stat_lwlock function which would say pid n1 currently in WALWriteLock and pid n2 in WALInsertLock. That means we should think about write ahead log tuning. Or pid n1 is in some clog-related LWLock, which means we need move clog to ramdisk. This is a stupid example, but it shows how useful LWLock tracing could be for DBAs. Even better idea is to collect daily LWLock distribution, find most frequent of them etc. I think it's more complicated than that - but I also think it'd be a great help for DBAs and us postgres hackers. An idea of this patch is to trace LWLocks with the lowest possible performance impact. We put integer lwLockID into procarray, then acquiring the LWLock we put its id to procarray and now we could pull procarray using a function to see if particular pid holds LWLock. But a backend can hold more than one lwlock at the same time? I don't think that's something we can ignore. Not perfect, but if we see sometimes somebody consumes a lot of particular LWLocks, we could investigate this matter in a more precise way using another tool. Something like that was implemented in the attached patch: issuing pgbench -c 50 -t 1000 -j 50 we have something like that: postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid ---+--+-- 2014-10-01 15:11:43.848765+02 | 57 | 4257 (1 row) Hm. So you just collect the lwlockid and the pid? That doesn't sound particularly interesting to me. In my opinion, you'd need at least: * pid * number of exclusive/shared acquirations * number of exclusive/shared acquirations that had to wait * total wait time of exclusive/shared acquirations Questions. 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock (simply for test purposes). Is it OK, or better to implement it somewhere inside pg_catalog or in another extension (for example pg_stat_statements)? I personally am doubtful that it makes much sense to move this into an extension. It'll likely be tightly enough interlinked to backend code that I don't see the point. But I'd not be surprised if others feel differently. I generally don't think you'll get interesting data without a fair bit of additional work. The first problem that comes to my mind about collecting enough data is that we have a very large number of lwlocks (fixed_number + 2 * shared_buffers). One 'trivial' way of implementing this is to have a per backend array collecting the information, and then a shared one accumulating data from it over time. But I'm afraid that's not going to fly :(. Hm. With the above sets of stats that'd be ~50MB per backend... Perhaps we should somehow encode this different for individual lwlock tranches? It's far less problematic to collect all this information for all but the buffer lwlocks... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hi, But this is imo prohibitive. Yes, we're doing it for a long while. But no, that's not ok. It actually prompoted me into prototyping the latch thing (in some other thread). I don't think existing practice justifies expanding it further. I see, in that case, this approach seems basically applicable. But if I understand correctly, this patch seems not to return out of the openssl code even when latch was found to be set in secure_raw_write/read. Correct. That's why I think it's the way forward. There's several problems now where the inability to do real things while reading/writing is a problem. I tried setting errno = ECONNRESET and it went well but seems a bad deed. Where and why did you do that? The patch of this message. http://www.postgresql.org/message-id/20140828.214704.93968088.horiguchi.kyot...@lab.ntt.co.jp The reason for setting errno (instead of a variable for it) is to trick openssl (or my_socck_write? I've forgot it..) into recognizing as if the underneath send(2) have returned with any uncontinueable error so it cannot be any of continueable errnos (EINTR/EWOULDBLOCK/EAGAIN). Iy my faint memory, only avoiding BIO_set_retry_write() in my_sock_write() dosn't work as expected but it might be enough that my_sock_write returns -1 and doesn't set BIO_set_retry_write(). The reason why ECONNNRESET is any of other errnos possible for send(2)(*1) doesn't seem to fit the real situation, and the blocked situation seems similar to resetted connection from the view that it cannot continue to work due to external condition, and it is used in be_tls_write() in a similary way. Come to think of it, setting ECONNRESET is not so evil? secure_raw_write(Port *port, const void *ptr, size_t len) { n = send(port-sock, ptr, len, 0); if (!port-noblock n 0 (errno == EWOULDBLOCK || errno == EAGAIN)) { w = WaitLatchOrSocket(MyProc-procLatch, ... if (w WL_LATCH_SET) { ResetLatch(MyProc-procLatch); /* * Force a return, so interrupts can be processed when not * (possibly) underneath a ssl library. */ errno = EINTR; (return n; // n is negative) my_sock_write(BIO *h, const char *buf, int size) { res = secure_raw_write(((Port *) h-ptr), buf, size); BIO_clear_retry_flags(h); if (res = 0) { if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) { BIO_set_retry_write(h); Hm, this seems, besides one comment, the code from the last patch in my series. Do you have a particular question about it? I didn't have a particluar qustion about it. This is cited only in order to show the route to retrying. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
On 09/15/2014 08:46 PM, Fabien COELHO wrote: I'm not sure I like the idea of printing a percentage. It might be unclear what the denominator was if somebody feels the urge to work back to the actual number of skipped transactions. I mean, I guess it's probably just the value you passed to -R, so maybe that's easy enough, but then why bother dividing in the first place? The user can do that easily enough if they want the data that way. Indeed skipped and late per second may have an unclear denominator. If you divide by the time, the unit would be tps, but 120 tps performance including 20 late tps, plus 10 skipped tps... I do not think it is that clear. Reporting tps for transaction *not* performed looks strange. Maybe late transactions could be given as a percentage of all processed transactions in the interval. But for skipped the percentage of what? The only number that would make sense is the total number of transactions schedule in the interval, but that would mean that the denominator for late would be different than the denominator for skipped, which is basically uncomprehensible. Hmm. I guess the absolute number makes sense, if you expect that there are normally zero skipped transactions, or at least a very small number. It's more like a good or no good indicator. Ok, I'm fine with that. The version I'm now working on prints output like this: $ ./pgbench -T10 -P1 --rate=1600 --latency-limit=10 starting vacuum...end. progress: 1.0 s, 1579.0 tps, lat 2.973 ms stddev 2.493, lag 2.414 ms, 4 skipped progress: 2.0 s, 1570.0 tps, lat 2.140 ms stddev 1.783, lag 1.599 ms, 0 skipped progress: 3.0 s, 1663.0 tps, lat 2.372 ms stddev 1.742, lag 1.843 ms, 4 skipped progress: 4.0 s, 1603.2 tps, lat 2.435 ms stddev 2.247, lag 1.902 ms, 13 skipped progress: 5.0 s, 1540.9 tps, lat 1.845 ms stddev 1.270, lag 1.303 ms, 0 skipped progress: 6.0 s, 1588.0 tps, lat 1.630 ms stddev 1.003, lag 1.097 ms, 0 skipped progress: 7.0 s, 1577.0 tps, lat 2.071 ms stddev 1.445, lag 1.517 ms, 0 skipped progress: 8.0 s, 1669.9 tps, lat 2.375 ms stddev 1.917, lag 1.846 ms, 0 skipped progress: 9.0 s, 1636.0 tps, lat 2.801 ms stddev 2.354, lag 2.250 ms, 5 skipped progress: 10.0 s, 1606.1 tps, lat 2.751 ms stddev 2.117, lag 2.197 ms, 2 skipped transaction type: TPC-B (sort of) scaling factor: 5 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 16034 number of transactions skipped: 28 (0.174 %) number of transactions above the 10.0 ms latency limit: 70 (0.436 %) latency average: 2.343 ms latency stddev: 1.940 ms rate limit schedule lag: avg 1.801 (max 9.994) ms tps = 1603.370819 (including connections establishing) tps = 1603.619536 (excluding connections establishing) Those progress lines are 79 or 80 characters wide, so they *just* fit in a 80-char terminal. Of course, if any of the printed numbers were higher, it would not fit. I don't see how to usefully make it more terse, though. I think we can live with this - these days it shouldn't be a huge problem to enlare the terminal to make the output fit. Here are new patches, again the first one is just refactoring, and the second one contains this feature. I'm planning to commit the first one shortly, and the second one later after people have had a chance to look at it. Greg: As the author of pgbench-tools, what do you think of this patch? The log file format, in particular. - Heikki From 512fde5dc3fde5fc1368b3bf0c09e3ea8e022fad Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 2 Oct 2014 12:58:14 +0300 Subject: [PATCH 1/2] Refactor pgbench log-writing code to a separate function. The doCustom function was incredibly long, this makes it a little bit more readable. --- contrib/pgbench/pgbench.c | 340 +++--- 1 file changed, 169 insertions(+), 171 deletions(-) diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 087e0d3..c14a577 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -347,6 +347,9 @@ static char *select_only = { static void setalarm(int seconds); static void *threadRun(void *arg); +static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, + AggVals *agg); + static void usage(void) { @@ -1016,6 +1019,16 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa PGresult *res; Command **commands; bool trans_needs_throttle = false; + instr_time now; + + /* + * gettimeofday() isn't free, so we get the current timestamp lazily the + * first time it's needed, and reuse the same value throughout this + * function after that. This also ensures that e.g. the calculated latency + * reported in the log file and in the totals are the same. Zero means + * not set yet. + */ + INSTR_TIME_SET_ZERO(now); top: commands = sql_files[st-use_file]; @@ -1049,10 +1062,10 @@ top: if (st-sleeping) {
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Thu, Oct 2, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: Not just from a oracle DBA POV ;). Generally. sure Saying that, principally they mean an Oracle Wait Interface analogue. The Basic idea is to have counters or sensors all around database kernel to measure what a particular backend is currently waiting for and how long/how often it waits. Yes, I can see that. I'm not sure whether lwlocks are the primary point I'd start with though. In many cases you'll wait on so called 'heavyweight' locks too... I try to kill two birds with one stone: make some prepositional work on main large topic and deliver some convenience about LWLock diagnostics. Maybe I'm wrong, but it seems to me it is much easier task to advocate some more desired feature: we have some heavyweight locks diagnostics tools and they are better than for lwlocks. Suppose we have a PostgreSQL instance under heavy write workload, but we do not know any details. We could pull from time to time pg_stat_lwlock function which would say pid n1 currently in WALWriteLock and pid n2 in WALInsertLock. That means we should think about write ahead log tuning. Or pid n1 is in some clog-related LWLock, which means we need move clog to ramdisk. This is a stupid example, but it shows how useful LWLock tracing could be for DBAs. Even better idea is to collect daily LWLock distribution, find most frequent of them etc. I think it's more complicated than that - but I also think it'd be a great help for DBAs and us postgres hackers. Sure it is more complicated, the example is stupid, just to show the point. An idea of this patch is to trace LWLocks with the lowest possible performance impact. We put integer lwLockID into procarray, then acquiring the LWLock we put its id to procarray and now we could pull procarray using a function to see if particular pid holds LWLock. But a backend can hold more than one lwlock at the same time? I don't think that's something we can ignore. Yes, this one of the next steps. I have not figure out yet, how to do it less painfully than LWLOCK_STATS does. I personally am doubtful that it makes much sense to move this into an extension. It'll likely be tightly enough interlinked to backend code that I don't see the point. But I'd not be surprised if others feel differently. Thats why I asked this question, and also because I have no idea where exactly put this functions inside backend if not into extension. But probably there are some more important tasks with this work than moving the function inside, I could do this later if it will be necessary. I generally don't think you'll get interesting data without a fair bit of additional work. Sure The first problem that comes to my mind about collecting enough data is that we have a very large number of lwlocks (fixed_number + 2 * shared_buffers). One 'trivial' way of implementing this is to have a per backend array collecting the information, and then a shared one accumulating data from it over time. But I'm afraid that's not going to fly :(. Hm. With the above sets of stats that'd be ~50MB per backend... Perhaps we should somehow encode this different for individual lwlock tranches? It's far less problematic to collect all this information for all but the buffer lwlocks... That is a good point. There are actually two things to keep in mind: i) user interface, ii) implementation i) Personally, as a DBA, I do not see much sense in unaggregated list of pid, lwlockid, wait_time or something like that. Much better to have aggregation by pid and lwlockid, for instance: - pid - lwlockid - lwlockname - total_count (or number of exclusive/shared acquirations that had to wait as you suggest, since we have a lot of lwlocks I am doubtful about how important is the information about non-waiting lwlocks) ii) Am I correct, that you suggest to go trough MainLWLockTranche and retrieve all available lwlock information to some structure like lwLockCell structure I've used in my patch? Something like hash lwlocid-usagecount? Regards, Ilya Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Thu, Oct 2, 2014 at 5:25 AM, Craig Ringer cr...@2ndquadrant.com wrote: It's not at all clear to me that a DTrace-like (or perf-based, rather) approach is unsafe, slow, or unsuitable for production use. With appropriate wrapper tools I think we could have quite a useful library of perf-based diagnostics and tracing tools for PostgreSQL. It is not actually very slow, overhead is quite reasonable since we want such comprehensive performance diagnostics. About stability, I have had a couple of issues with postgres crushes with dtrace and dos not without. Most of them was on FreeBSD, which is still in use by many people and were caused actually by freebsd dtrace, but for me it is quite enough to have doubts about keeping dtrace aware build in production. OK, OK - maybe things were changed last couple of years or will change soon - still dtrace/perf is well enough for those who is familiar with it, but you need a really convenient wrapper to make oracle/db2 DBA happy with using such approach. Resolving lock IDs to names might be an issue, though. I am afraid it is -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: doc: simplify examples of dynamic SQL
Hi There are few less readable examples of dynamic SQL in plpgsql doc like: EXECUTE 'SELECT count(*) FROM ' || tabname::regclass || ' WHERE inserted_by = $1 AND inserted = $2' INTO c USING checked_user, checked_date; or EXECUTE 'UPDATE tbl SET ' || quote_ident(colname) || ' = $' || newvalue || '$ WHERE key = ' || quote_literal(keyvalue); We can show a examples based on format function only: EXECUTE format('SELECT count(*) FROM %I' ' WHERE inserted_by = $1 AND inserted = $2', tabname) INTO c USING checked_user, checked_date; or EXECUTE format('UPDATE tbl SET %I = newvalue WHERE key = %L', colname, keyvalue) or EXECUTE format('UPDATE tbl SET %I = newvalue WHERE key = $1', colname) USING keyvalue; A old examples are very instructive, but little bit less readable and maybe too complex for beginners. Opinions? Regards Pavel
Re: [HACKERS] pgcrypto: PGP signatures
On 10/2/14 1:47 PM, Heikki Linnakangas wrote: I looked at this briefly, and was surprised that there is no support for signing a message without encrypting it. Is that intentional? Instead of adding a function to encrypt and sign a message, I would have expected this to just add a new function for signing, and you could then pass it an already-encrypted blob, or plaintext. Yes, that's intentional. The signatures are part of the encrypted data here, so you can't look at a message and determine who sent it. There was brief discussion about this upthread (though no one probably added any links to those discussions into the commit fest app), and I still think that both types of signing would probably be valuable. But this patch is already quite big, and I really have no desire to work on this sign anything functionality. The pieces are there, though, so if someone wants to do it, I don't see why they couldn't. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
* Craig Ringer (cr...@2ndquadrant.com) wrote: The patch https://commitfest.postgresql.org/action/patch_view?id=885 (discussion starts here I hope - http://www.postgresql.org/message-id/4fe8ca2c.3030...@uptime.jp) demonstrates performance problems; LWLOCK_STAT, LOCK_DEBUG and DTrace-like approach are slow, unsafe for production use and a bit clumsy for using by DBA. It's not at all clear to me that a DTrace-like (or perf-based, rather) approach is unsafe, slow, or unsuitable for production use. I've certainly had it take production systems down (perf, specifically), so I'd definitely consider it unsafe. I wouldn't say it's unusable, but it's certainly not what we should have as the end-goal for PG. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
* Andres Freund (and...@2ndquadrant.com) wrote: 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock (simply for test purposes). Is it OK, or better to implement it somewhere inside pg_catalog or in another extension (for example pg_stat_statements)? I personally am doubtful that it makes much sense to move this into an extension. It'll likely be tightly enough interlinked to backend code that I don't see the point. But I'd not be surprised if others feel differently. I agree that this doesn't make sense as an extension. I generally don't think you'll get interesting data without a fair bit of additional work. I'm not sure about this.. The first problem that comes to my mind about collecting enough data is that we have a very large number of lwlocks (fixed_number + 2 * shared_buffers). One 'trivial' way of implementing this is to have a per backend array collecting the information, and then a shared one accumulating data from it over time. But I'm afraid that's not going to fly :(. Hm. With the above sets of stats that'd be ~50MB per backend... I was just going to suggest exactly this- a per-backend array which then gets pushed into a shared area periodically. Taking up 50MB per backend is quite a bit though. :/ Perhaps we should somehow encode this different for individual lwlock tranches? It's far less problematic to collect all this information for all but the buffer lwlocks... Yeah, that seems like it would at least be a good approach to begin with. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication identifiers, take 3
On Thu, Oct 2, 2014 at 4:49 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: An origin column in the table itself helps tremendously to debug issues with the replication system. In many if not most scenarios, I think you'd want to have that extra column, even if it's not strictly required. I like a lot of what you wrote here, but I strongly disagree with this part. A good replication solution shouldn't require changes to the objects being replicated. The triggers that Slony and other current logical replication solutions use are a problem not only because they're slow (although that is a problem) but also because they represent a user-visible wart that people who don't otherwise care about the fact that their database is being replicated have to be concerned with. I would agree that some people might, for particular use cases, want to include origin information in the table that the replication system knows about, but it shouldn't be required. When you look at the replication systems that we have today, you've basically got streaming replication, which is high-performance and fairly hands-off (at least once you get it set up properly; that part can be kind of a bear) but can't cross versions let alone database systems and requires that the slaves be strictly read-only. Then on the flip side you've got things like Slony, Bucardo, and others. Some of these allow multi-master; all of them at least allow table-level determination of which server has the writable copy. Nearly all of them are cross-version and some even allow replication into non-PostgreSQL systems. But they are slow *and administratively complex*. If we're able to get something that feels like streaming replication from a performance and administrative complexity point of view but can be cross-version and allow at least some writes on slaves, that's going to be an epic leap forward for the project. In my mind, that means it's got to be completely hands-off from a schema design point of view: you should be able to start up a database and design it however you want, put anything you like into it, and then decide later that you want to bolt logical replication on top of it, just as you can for streaming physical replication. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost sfr...@snowman.net wrote: Perhaps I'm just being a bit over the top, but all this per-character work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I suppose it's not so bad, but is there no hope to increase that and make this whole process more efficient? Just a thought. I'm not sure I understand what you're getting at here. Was just thinking that we might be able to work out what needs to be done without having to actually do it on a per-character basis. That said, I'm not sure it's really worth the effort given that we're talking about at most 8 bytes currently. I had originally been thinking that we might increase the minimum size as it might make things more efficient, but it's not clear that'd really end up being the case either and, regardless, it's probably not worth worrying about at this point. I'm still not entirely sure we're on the same page. Most of the data movement for shm_mq is done via memcpy(), which I think is about as efficient as it gets. The detailed character-by-character handling only really comes up when shm_mq_send() is passed multiple chunks with lengths that are not a multiple of MAXIMUM_ALIGNOF. Then we have to fiddle a bit more. So making MAXIMUM_ALIGNOF bigger would actually cause us to do more fiddling, not less. When I originally designed this queue, I had the idea in mind that life would be simpler if the queue head and tail pointers always advanced in multiples of MAXIMUM_ALIGNOF. That didn't really work out as well as I'd hoped; maybe I would have been better off having the queue pack everything in tightly and ignore alignment. However, there is some possible efficiency advantage of the present system: when a message fits in the queue without wrapping, shm_mq_receive() returns a pointer to the message, and the caller can assume that message is properly aligned. If the queue didn't maintain alignment internally, you'd need to do a copy there before accessing anything inside the message that might be aligned. Granted, we don't have any code that takes advantage of that yet, at least not in core, but the potential is there. Still, I may have made the wrong call. But, I don't think it's the of this patch to rework the whole framework; we can do that in the future after this has a few more users and the pros and cons of various approaches are (hopefully) more clear. It's not a complex API, so substituting a different implementation later on probably wouldn't be too hard. Anyway, it sounds like you're on board with me committing the first patch of the series, so I'll do that next week absent objections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Oct 1, 2014 at 4:56 PM, Stephen Frost sfr...@snowman.net wrote: Was just thinking that we might be able to work out what needs to be done without having to actually do it on a per-character basis. That said, I'm not sure it's really worth the effort given that we're talking about at most 8 bytes currently. I had originally been thinking that we might increase the minimum size as it might make things more efficient, but it's not clear that'd really end up being the case either and, regardless, it's probably not worth worrying about at this point. I'm still not entirely sure we're on the same page. Most of the data movement for shm_mq is done via memcpy(), which I think is about as efficient as it gets. Right- agreed. I had originally thought we were doing things on a per-MAXIMUM_ALIGNOF-basis somewhere else, but that appears to be an incorrect assumption (which I'm glad for). The detailed character-by-character handling only really comes up when shm_mq_send() is passed multiple chunks with lengths that are not a multiple of MAXIMUM_ALIGNOF. Then we have to fiddle a bit more. So making MAXIMUM_ALIGNOF bigger would actually cause us to do more fiddling, not less. Sorry- those were two independent items. Regarding the per-character work, I was thinking we could work out the number of characters which need to be moved and then use memcpy directly rather than doing the per-character work, but as noted, we shouldn't be going through that loop very often or for very many iterations anyway, and we have to deal with moving forward through the iovs, so we'd still have to have a loop there regardless. When I originally designed this queue, I had the idea in mind that life would be simpler if the queue head and tail pointers always advanced in multiples of MAXIMUM_ALIGNOF. That didn't really work out as well as I'd hoped; maybe I would have been better off having the queue pack everything in tightly and ignore alignment. However, there is some possible efficiency advantage of the present system: when a message fits in the queue without wrapping, shm_mq_receive() returns a pointer to the message, and the caller can assume that message is properly aligned. If the queue didn't maintain alignment internally, you'd need to do a copy there before accessing anything inside the message that might be aligned. Granted, we don't have any code that takes advantage of that yet, at least not in core, but the potential is there. Still, I may have made the wrong call. But, I don't think it's the of this patch to rework the whole framework; we can do that in the future after this has a few more users and the pros and cons of various approaches are (hopefully) more clear. It's not a complex API, so substituting a different implementation later on probably wouldn't be too hard. Agreed. Anyway, it sounds like you're on board with me committing the first patch of the series, so I'll do that next week absent objections. Works for me. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Log notice that checkpoint is to be written on shutdown
Hi, we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. As no message besides shutting down is written to the server log in this case, we even had users believing the server was hanging and pondering killing it manually. In order to alert those users that a checkpoint is being written, I propose to add a log message waiting for checkpoint ... on shutdown, even if log_checkpoints is disabled, as this particular checkpoint might be important information. I've attached a trivial patch for this, should it be added to the next commitfest? Cheers, Michael -- Michael Banck Projektleiter / Berater Tel.: +49 (2161) 4643-171 Fax: +49 (2161) 4643-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5a4dbb9..78483ca 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags) /* * If enabled, log checkpoint start. We postpone this until now so as not - * to log anything if we decided to skip the checkpoint. + * to log anything if we decided to skip the checkpoint. If we are during + * shutdown and checkpoints are not being logged, add a log message that a + * checkpoint is to be written and shutdown is potentially delayed. */ if (log_checkpoints) LogCheckpointStart(flags, false); + else if (flags CHECKPOINT_IS_SHUTDOWN) + ereport(LOG, (errmsg(waiting for checkpoint ...))); TRACE_POSTGRESQL_CHECKPOINT_START(flags); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time measurement format - more human readable
On 9/29/14, 1:08 AM, Andres Freund wrote: On 2014-09-28 20:32:30 -0400, Gregory Smith wrote: There are already a wide range of human readable time interval output formats available in the database; see the list at http://www.postgresql.org/docs/current/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE He's talking about psql's \timing... I got that. My point was that even though psql's timing report is kind of a quick thing hacked into there, if it were revised I'd expect two things will happen eventually: -Asking if any of the the interval conversion code can be re-used for this purpose, rather than adding yet another custom to one code path standard. -Asking if this should really just be treated like a full interval instead, and then overlapping with a significant amount of that baggage so that you have all the existing format choices. That's actually a good idea. So what you're sayig is that if I come up with some nice way of setting customized time output format, keeping the default the way it is now, then it would be worth considering? Now I understand why it says that a discussion is recommended before implementing and posting. ;-) bogdan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: So in essence what we're going to do is that the balance mechanism considers only tables that don't have per-table configuration options; for those that do, we will use the values configured there without any changes. I'll see about implementing this and making sure it finds its way to 9.4beta3. Here's a patch that makes it work as proposed. How do people feel about back-patching this? On one hand it seems there's a lot of fear of changing autovacuum behavior in back branches, because for many production systems it has carefully been tuned; on the other hand, it seems hard to believe that anyone has tuned the system to work sanely given how insanely per-table options behave in the current code. I agree with both of those arguments. I have run into very few customers who have used the autovacuum settings to customize behavior for particular tables, and anyone who hasn't should see no change (right?), so my guess is that the practical impact of the change will be pretty limited. On the other hand, it's a clear behavior change. Someone could have set the per-table limit to something enormous and never suffered from that setting because it has basically no effect as things stand right now today; and that person might get an unpleasant surprise when they update. I would at least back-patch it to 9.4. I could go either way on whether to back-patch into older branches. I lean mildly in favor of it at the moment, but with considerable trepidation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] port/atomics/arch-*.h are missing from installation
I got the following error when I try to build my extension towards the latest master branch. Is the port/atomics/*.h files forgotten on make install? [kaigai@magro pg_strom]$ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -Wall -O0 -DPGSTROM_DEBUG=1 -I. -I./ -I/usr/local/pgsql/include/server -I/usr/local/pgsql/include/internal -D_GNU_SOURCE -c -o shmem.o shmem.c In file included from /usr/local/pgsql/include/server/storage/barrier.h:21:0, from shmem.c:18: /usr/local/pgsql/include/server/port/atomics.h:65:36: fatal error: port/atomics/arch-x86.h: No such file or directory # include port/atomics/arch-x86.h ^ compilation terminated. make: *** [shmem.o] Error 1 Even though the source directory has header files... [kaigai@magro sepgsql]$ find ./src | grep atomics ./src/include/port/atomics ./src/include/port/atomics/generic-xlc.h ./src/include/port/atomics/arch-x86.h ./src/include/port/atomics/generic-acc.h ./src/include/port/atomics/arch-ppc.h ./src/include/port/atomics/generic.h ./src/include/port/atomics/arch-hppa.h ./src/include/port/atomics/generic-msvc.h ./src/include/port/atomics/arch-ia64.h ./src/include/port/atomics/generic-sunpro.h ./src/include/port/atomics/arch-arm.h ./src/include/port/atomics/generic-gcc.h ./src/include/port/atomics/fallback.h ./src/include/port/atomics.h ./src/backend/port/atomics.c the install destination has only atomics.h [kaigai@magro sepgsql]$ find /usr/local/pgsql/include | grep atomics /usr/local/pgsql/include/server/port/atomics.h The attached patch is probably right remedy. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp pgsql-v9.5-fixup-makefile-for-atomics.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inefficient barriers on solaris with sun cc
On 2014-09-26 10:28:21 -0400, Robert Haas wrote: On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa o...@ohmu.fi wrote: So you think a read barrier is the same thing as an acquire barrier and a write barrier is the same as a release barrier? That would be surprising. It's certainly not true in general. The above doc describes the difference: read barrier requires loads before the barrier to be completed before loads after the barrier - an acquire barrier is the same, but it also requires loads to be complete before stores after the barrier. Similarly write barrier requires stores before the barrier to be completed before stores after the barrier - a release barrier is the same, but it also requires loads before the barrier to be completed before stores after the barrier. So acquire is read + loads-before-stores and release is write + loads-before-stores. Hmm. My impression was that an acquire barrier means that loads and stores can migrate forward across the barrier but not backward; and that a release barrier means that loads and stores can migrate backward across the barrier but not forward. It's actually more complex than that :( Simple things first: Oracle's definition seems pretty iron clad: http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html __machine_acq_barrier is a clear superset of __machine_r_barrier and __machine_rel_barrier is a clear superset of __machine_w_barrier And that's what we're essentially discussing, no? That said, there seems to be no reason to avoid using __machine_r/w_barrier(). But for the reason why I defined pg_read_barrier/write_barrier to __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): The C11/C++11 definition it's made for is hellishly hard to understand. There's very subtle differences between acquire/release operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant parts of the standards. I think it essentially guarantees the mapping we're talking about, but it's not entirely clear. The way acquire/release fences are defined is that they form a 'synchronizes-with' relationship with each other. Which would, I think, be sufficient given that without a release like operation on the other thread a read/wrie barrier isn't worth much. But there's a rub in that it requires a atomic operation involved somehere to give that guarantee. I *did* check that the emitted code on relevant architectures is sane, but that doesn't guarantee anything for the future. Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is definitely guaranteeing what we need, even if superflously heavy on some platforms. It still is significantly more efficient than __sync_synchronize() which is what was used before. I.e. it generates no code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync otherwise, although I don't know why) and similar on ia64. As a reference, relevant standard sections are: C11: 5.1.2.4 5); 7.17.4 C++11: 29.3; 1.10 Not that we can rely on those, but I think it's a good thing to orient on. I'm actually not really sure what this means unless the barrier also does something in and of itself. For example, consider this: some stuff CAS(lock, 0, 1) // i am an acquire barrier more stuff lock = 0 // i am a release barrier even more stuff If the CAS() and lock = 0 instructions were FULL barriers, then we'd be saying that the stuff that happens in the critical section needs to be exactly more stuff. But if they are acquire and release barriers, respectively, then the CPU is allowed to move some stuff or even more stuff into the critical section; but what it can't do is move more stuff out. Now if you just have a naked acquire barrier that is not doing anything itself, I don't really know what the semantics of that should be. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. Say I want to appear to only change things while flag is 1, so I write this code: flag = 1 acquire barrier things++ release barrier flag = 0 With the definition you (and Oracle) propose As written above, I don't think that applies to oracle's definition? this won't work, because there's nothing to keep the modification of things from being reordered before flag = 1. What good is that? Apparently, I don't have any idea! I hope it's a bit clearer now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-09-25 10:42:29 -0400, Robert Haas wrote: On Thu, Sep 25, 2014 at 10:24 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-25 10:22:47 -0400, Robert Haas wrote: On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund and...@2ndquadrant.com wrote: That leads me to wonder: Have you measured different, lower, number of buffer mapping locks? 128 locks is, if we'd as we should align them properly, 8KB of memory. Common L1 cache sizes are around 32k... Amit has some results upthread showing 64 being good, but not as good as 128. I haven't verified that myself, but have no reason to doubt it. How about you push the spinlock change and I crosscheck the partition number on a multi socket x86 machine? Seems worthwile to make sure that it doesn't cause problems on x86. I seriously doubt it'll, but ... OK. Given that the results look good, do you plan to push this? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? If so, and if you don't have any reservations, yeah I'll go change it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-02 10:40:30 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? Yes. Now that I think about it I wonder if we shouldn't define MAX_SIMUL_LWLOCKS like #define MAX_SIMUL_LWLOCKS (NUM_BUFFER_PARTITIONS + 64) or something like that? If so, and if you don't have any reservations, yeah I'll go change it. Yes, I'm happy with going forward. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inefficient barriers on solaris with sun cc
On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote: It's actually more complex than that :( Simple things first: Oracle's definition seems pretty iron clad: http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html __machine_acq_barrier is a clear superset of __machine_r_barrier and __machine_rel_barrier is a clear superset of __machine_w_barrier And that's what we're essentially discussing, no? That said, there seems to be no reason to avoid using __machine_r/w_barrier(). So let's use those, then. But for the reason why I defined pg_read_barrier/write_barrier to __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): The C11/C++11 definition it's made for is hellishly hard to understand. There's very subtle differences between acquire/release operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant parts of the standards. I think it essentially guarantees the mapping we're talking about, but it's not entirely clear. The way acquire/release fences are defined is that they form a 'synchronizes-with' relationship with each other. Which would, I think, be sufficient given that without a release like operation on the other thread a read/wrie barrier isn't worth much. But there's a rub in that it requires a atomic operation involved somehere to give that guarantee. I *did* check that the emitted code on relevant architectures is sane, but that doesn't guarantee anything for the future. Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is definitely guaranteeing what we need, even if superflously heavy on some platforms. It still is significantly more efficient than __sync_synchronize() which is what was used before. I.e. it generates no code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync otherwise, although I don't know why) and similar on ia64. A fully barrier on x86 should be an mfence, right? With only a compiler barrier, you have loads ordered with respect to loads and stores ordered with respect to stores, but the load/store ordering isn't fully defined. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. But, uh, we still don't seem to know what those guarantees actually ARE. Say I want to appear to only change things while flag is 1, so I write this code: flag = 1 acquire barrier things++ release barrier flag = 0 With the definition you (and Oracle) propose this won't work, because there's nothing to keep the modification of things from being reordered before flag = 1. What good is that? Apparently, I don't have any idea! As written above, I don't think that applies to oracle's definition? Oracle's definition doesn't look sufficient there. The acquire barrier guarantees that the load operations before the barrier will be completed before the load and store operations after the barrier, but the only operation before the barrier is a store, not a load, so it guarantees nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Thu, Oct 2, 2014 at 10:44 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-02 10:40:30 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? Yes. Now that I think about it I wonder if we shouldn't define MAX_SIMUL_LWLOCKS like #define MAX_SIMUL_LWLOCKS (NUM_BUFFER_PARTITIONS + 64) or something like that? Nah. That assumes NUM_BUFFER_PARTITIONS will always be the biggest thing, and I don't see any reason to assume that, even if we're making it true for now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] port/atomics/arch-*.h are missing from installation
Hi, On 2014-10-02 23:33:36 +0900, Kohei KaiGai wrote: I got the following error when I try to build my extension towards the latest master branch. Is the port/atomics/*.h files forgotten on make install? You're right. The attached patch is probably right remedy. I've changed the order to be alphabetic, but otherwise it looks good. Pushed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
* Robert Haas (robertmh...@gmail.com) wrote: On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: So in essence what we're going to do is that the balance mechanism considers only tables that don't have per-table configuration options; for those that do, we will use the values configured there without any changes. I'll see about implementing this and making sure it finds its way to 9.4beta3. Here's a patch that makes it work as proposed. How do people feel about back-patching this? On one hand it seems there's a lot of fear of changing autovacuum behavior in back branches, because for many production systems it has carefully been tuned; on the other hand, it seems hard to believe that anyone has tuned the system to work sanely given how insanely per-table options behave in the current code. I agree with both of those arguments. I have run into very few customers who have used the autovacuum settings to customize behavior for particular tables, and anyone who hasn't should see no change (right?), so my guess is that the practical impact of the change will be pretty limited. On the other hand, it's a clear behavior change. Someone could have set the per-table limit to something enormous and never suffered from that setting because it has basically no effect as things stand right now today; and that person might get an unpleasant surprise when they update. I would at least back-patch it to 9.4. I could go either way on whether to back-patch into older branches. I lean mildly in favor of it at the moment, but with considerable trepidation. I'm fine with putting it into 9.4. I'm not sure that I see the value in changing the back-branches and then having to deal with the well, if you're on 9.3.5 then X, but if you're on 9.3.6 then Y or having to figure out how to deal with the documentation for this. Has there been any thought as to what pg_upgrade should do..? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Michael Banck-2 wrote Hi, we have seen repeatedly that users can be confused about why PostgreSQL is not shutting down even though they requested it. Usually, this is because `log_checkpoints' is not enabled and the final checkpoint is being written, delaying shutdown. As no message besides shutting down is written to the server log in this case, we even had users believing the server was hanging and pondering killing it manually. In order to alert those users that a checkpoint is being written, I propose to add a log message waiting for checkpoint ... on shutdown, even if log_checkpoints is disabled, as this particular checkpoint might be important information. I've attached a trivial patch for this, should it be added to the next commitfest? Peeking at this provokes a couple of novice questions: While apparently it is impossible to have a checkpoint to log at recovery end the decision to use the bitwise-OR instead of the local shutdown bool seemed odd at first... LogCheckpointStart creates the log entry unconditionally - leaving the caller responsible for evaluating log_checkpoints - while LogCheckpointEnd has the log_checkpoints condition built into it. I mention this because by the same argument advocating the logging of the checkpoint start we should also log its end. In order to do that here, though, we have to change log_checkpoints before calling LogCheckpointEnd. Now, since we going to shutdown anyway that seems benign (and forecloses on any need to set it back to the prior value) but its still ugly. Just some thoughts... The rationale makes perfect sense. +1 David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Log-notice-that-checkpoint-is-to-be-written-on-shutdown-tp5821394p5821417.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inefficient barriers on solaris with sun cc
On 2014-10-02 10:55:06 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote: It's actually more complex than that :( Simple things first: Oracle's definition seems pretty iron clad: http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html __machine_acq_barrier is a clear superset of __machine_r_barrier and __machine_rel_barrier is a clear superset of __machine_w_barrier And that's what we're essentially discussing, no? That said, there seems to be no reason to avoid using __machine_r/w_barrier(). So let's use those, then. Right, I've never contended that. But for the reason why I defined pg_read_barrier/write_barrier to __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): The C11/C++11 definition it's made for is hellishly hard to understand. There's very subtle differences between acquire/release operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant parts of the standards. I think it essentially guarantees the mapping we're talking about, but it's not entirely clear. The way acquire/release fences are defined is that they form a 'synchronizes-with' relationship with each other. Which would, I think, be sufficient given that without a release like operation on the other thread a read/wrie barrier isn't worth much. But there's a rub in that it requires a atomic operation involved somehere to give that guarantee. I *did* check that the emitted code on relevant architectures is sane, but that doesn't guarantee anything for the future. Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is definitely guaranteeing what we need, even if superflously heavy on some platforms. It still is significantly more efficient than __sync_synchronize() which is what was used before. I.e. it generates no code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync otherwise, although I don't know why) and similar on ia64. A fully barrier on x86 should be an mfence, right? Right. I've not talked about changing full barrier semantics. What I was referring to is that until the atomics patch we always redefine read/write barriers to be full barriers when using gcc intrinsics. With only a compiler barrier, you have loads ordered with respect to loads and stores ordered with respect to stores, but the load/store ordering isn't fully defined. Yes. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. But, uh, we still don't seem to know what those guarantees actually ARE. Paired together they form a synchronized-with relationship. Problem #1 is that the standard's language isn't, to me at least, clear if there's not some case where that's not the case. Problem #2 is that our current README.barrier definition doesn't actually require barriers to be paired. Which imo is bad, but still a fact. The definition of ACQ_REL is pretty clearly sufficient imo: Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.. Say I want to appear to only change things while flag is 1, so I write this code: flag = 1 acquire barrier things++ release barrier flag = 0 With the definition you (and Oracle) propose this won't work, because there's nothing to keep the modification of things from being reordered before flag = 1. What good is that? Apparently, I don't have any idea! As written above, I don't think that applies to oracle's definition? Oracle's definition doesn't look sufficient there. Perhaps I'm just not understanding what you want to show with this example. This started as a discussion of comparing acquire/release with read/write barriers, right? Or are you generally wondering about the point acquire/release barriers? The acquire barrier guarantees that the load operations before the barrier will be completed before the load and store operations after the barrier, but the only operation before the barrier is a store, not a load, so it guarantees nothing. Well, 'acquire' operations always have to related to a load. That's why standalone 'acquire fences' or 'acquire barriers' are more heavyweight than just a acquiring read. And realistically, in the above example, you'd have to read flag to see that it's not already 1, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-02 10:56:05 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:44 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-02 10:40:30 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? Yes. Now that I think about it I wonder if we shouldn't define MAX_SIMUL_LWLOCKS like #define MAX_SIMUL_LWLOCKS (NUM_BUFFER_PARTITIONS + 64) or something like that? Nah. That assumes NUM_BUFFER_PARTITIONS will always be the biggest thing, and I don't see any reason to assume that, even if we're making it true for now. The reason I'm suggesting is that NUM_BUFFER_PARTITIONS (and NUM_LOCK_PARTITIONS) are the cases where we can expect many lwlocks to be held at the same time. It doesn't seem friendly to users experimenting with changing this to know about a define that's private to lwlock.c. But I'm fine with doing this not at all or separately - there's no need to actually do it together. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inefficient barriers on solaris with sun cc
On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote: So let's use those, then. Right, I've never contended that. OK, cool. A fully barrier on x86 should be an mfence, right? Right. I've not talked about changing full barrier semantics. What I was referring to is that until the atomics patch we always redefine read/write barriers to be full barriers when using gcc intrinsics. OK, got it. If there's a cheaper way to tell gcc loads before loads or stores before stores, I'm fine with doing that for those cases. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. But, uh, we still don't seem to know what those guarantees actually ARE. Paired together they form a synchronized-with relationship. Problem #1 is that the standard's language isn't, to me at least, clear if there's not some case where that's not the case. Problem #2 is that our current README.barrier definition doesn't actually require barriers to be paired. Which imo is bad, but still a fact. I don't know what a synchronized-with relationship means. Also, I pretty much designed those definitions to match what Linux does. And it doesn't require that either, though it says that in most cases it will work out that way. The definition of ACQ_REL is pretty clearly sufficient imo: Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.. I dunno. What's an acquire load? What's a release store? I know what loads and stores are; I don't know what the adjectives mean. The acquire barrier guarantees that the load operations before the barrier will be completed before the load and store operations after the barrier, but the only operation before the barrier is a store, not a load, so it guarantees nothing. Well, 'acquire' operations always have to related to a load.That's why standalone 'acquire fences' or 'acquire barriers' are more heavyweight than just a acquiring read. Again, I can't judge any of this, because you haven't defined the terms anywhere. And realistically, in the above example, you'd have to read flag to see that it's not already 1, right? Not necessarily. You could be the only writer. Think about the way the backend entries in the stats system work. The point of setting the flag may be for other people to know whether the data is in the middle of being modified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: I agree with both of those arguments. I have run into very few customers who have used the autovacuum settings to customize behavior for particular tables, and anyone who hasn't should see no change (right?), so my guess is that the practical impact of the change will be pretty limited. On the other hand, it's a clear behavior change. Someone could have set the per-table limit to something enormous and never suffered from that setting because it has basically no effect as things stand right now today; and that person might get an unpleasant surprise when they update. I would at least back-patch it to 9.4. I could go either way on whether to back-patch into older branches. I lean mildly in favor of it at the moment, but with considerable trepidation. I'm fine with putting it into 9.4. I'm not sure that I see the value in changing the back-branches and then having to deal with the well, if you're on 9.3.5 then X, but if you're on 9.3.6 then Y or having to figure out how to deal with the documentation for this. Well, the value obviously is that we would fix the bug that Mark Kirkwood reported that started this thread. Basically, if you are on 9.3.5 or earlier any per-table options for autovacuum cost delay will misbehave (meaning: any such table will be processed with settings flattened according to balancing of the standard options, _not_ the configured ones). If you are on 9.3.6 or newer they will behave as described in the docs. Has there been any thought as to what pg_upgrade should do..? Yes, I'm thinking there's nothing it should do. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Hi, Am Donnerstag, den 02.10.2014, 08:17 -0700 schrieb David G Johnston: Michael Banck-2 wrote I've attached a trivial patch for this, should it be added to the next commitfest? Peeking at this provokes a couple of novice questions: While apparently it is impossible to have a checkpoint to log at recovery end the decision to use the bitwise-OR instead of the local shutdown bool seemed odd at first... Good point, using `shutdown' is probably better. I guess that local bool had escaped my notice when I first had a look at this a while ago. LogCheckpointStart creates the log entry unconditionally - leaving the caller responsible for evaluating log_checkpoints - while LogCheckpointEnd has the log_checkpoints condition built into it. I noticed this as well. My guess would be that this is because LogCheckpointEnd() also updates the BgWriterStats statistics, and should do that every time, not just when the checkpoint is getting logged. Whether or not log_checkpoint is set (and logging should happen) is evaluated directly after that. Some refactoring of this might be useful (or there might be a very good reason why this was done like this), but this seems outside the scope of this patch. AIUI, shutdown will not be further delayed after the checkpoint is finished, so no further logging is required for the purpose of this patch. The rationale makes perfect sense. +1 Cool! I have attached an updated patch using the `shutdown' bool. Michael -- Michael Banck Projektleiter / Berater Tel.: +49 (2161) 4643-171 Fax: +49 (2161) 4643-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5a4dbb9..f2716ae 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags) /* * If enabled, log checkpoint start. We postpone this until now so as not - * to log anything if we decided to skip the checkpoint. + * to log anything if we decided to skip the checkpoint. If we are during + * shutdown and checkpoints are not being logged, add a log message that a + * checkpoint is to be written and shutdown is potentially delayed. */ if (log_checkpoints) LogCheckpointStart(flags, false); + else if (shutdown) + ereport(LOG, (errmsg(waiting for checkpoint ...))); TRACE_POSTGRESQL_CHECKPOINT_START(flags); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DDL Damage Assessment
Hi fellow hackers, I would like to work on a new feature allowing our users to assess the amount of trouble they will run into when running a DDL script on their production setups, *before* actually getting their services down. The main practical example I can offer here is the ALTER TABLE command. Recent releases are including very nice optimisations to it, so much so that it's becoming increasingly hard to answer some very basic questions: - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) - will the table have to be rewritten? the indexes? Of course the docs are answering parts of those, but in particular the table rewriting rules are complex enough that “accidental DBAs” will fail to predict if the target data type is binary coercible to the current one. Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? 2. What do you think such a feature should look like? 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hi fellow hackers, I would like to work on a new feature allowing our users to assess the amount of trouble they will run into when running a DDL script on their production setups, *before* actually getting their services down. The main practical example I can offer here is the ALTER TABLE command. Recent releases are including very nice optimisations to it, so much so that it's becoming increasingly hard to answer some very basic questions: - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) - will the table have to be rewritten? the indexes? Of course the docs are answering parts of those, but in particular the table rewriting rules are complex enough that “accidental DBAs” will fail to predict if the target data type is binary coercible to the current one. Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? 2. What do you think such a feature should look like? 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. What you are proposing is some kind of dry-run with verbose output? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 1:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hi fellow hackers, I would like to work on a new feature allowing our users to assess the amount of trouble they will run into when running a DDL script on their production setups, *before* actually getting their services down. The main practical example I can offer here is the ALTER TABLE command. Recent releases are including very nice optimisations to it, so much so that it's becoming increasingly hard to answer some very basic questions: - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) - will the table have to be rewritten? the indexes? Of course the docs are answering parts of those, but in particular the table rewriting rules are complex enough that “accidental DBAs” will fail to predict if the target data type is binary coercible to the current one. Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? 2. What do you think such a feature should look like? 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. What you are proposing is some kind of dry-run with verbose output? EXPLAIN ALTER TABLE ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
I think the main issue is when a table rewrite is triggered on a DDL command on a large table, as this is what frequently leads to unavailability. The idea of introducing a NOREWRITE keyword to DDL commands then came up (credit: Peter Geoghegan). When the NOREWRITE keyword is used and the DDL statement would rewrite the table, the command errors and exits. This would allow ORM and framework authors to include the NOREWRITE option by default, only to be disabled on a per-statement basis by the developer, once they have assessed that it may be safe or otherwise they still want to proceed with this. The workflow for an app developer then becomes: * Write offending data migration (eg: add a column with a NOT NULL constraint and default value) * Test it locally, either by running automated test suite or running on staging * See that it fails because of NOREWRITE option * Assess situation. If it's a small table, or I still want to ignore, override the option. Or rewrite migration to avoid rewrite. * Repeat I like this a lot just because it's simple, limited in scope, and can be easily integrated into ORMs saving users hours of downtime and frustration. Thoughts? On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hi fellow hackers, I would like to work on a new feature allowing our users to assess the amount of trouble they will run into when running a DDL script on their production setups, *before* actually getting their services down. The main practical example I can offer here is the ALTER TABLE command. Recent releases are including very nice optimisations to it, so much so that it's becoming increasingly hard to answer some very basic questions: - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) - will the table have to be rewritten? the indexes? Of course the docs are answering parts of those, but in particular the table rewriting rules are complex enough that “accidental DBAs” will fail to predict if the target data type is binary coercible to the current one. Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? 2. What do you think such a feature should look like? 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. What you are proposing is some kind of dry-run with verbose output? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 10/02/2014 05:40 PM, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? Hmm, do we actually ever need to hold all the buffer partition locks at the same time? At a quick search for NUM_BUFFER_PARTITIONS in the code, I couldn't find any place where we'd do that. I bumped up NUM_BUFFER_PARTITIONS to 128, but left MAX_SIMUL_LWLOCKS at 100, and did make check. It passed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-02 20:04:58 +0300, Heikki Linnakangas wrote: On 10/02/2014 05:40 PM, Robert Haas wrote: On Thu, Oct 2, 2014 at 10:36 AM, Andres Freund and...@2ndquadrant.com wrote: OK. Given that the results look good, do you plan to push this? By this, you mean the increase in the number of buffer mapping partitions to 128, and a corresponding increase in MAX_SIMUL_LWLOCKS? Hmm, do we actually ever need to hold all the buffer partition locks at the same time? At a quick search for NUM_BUFFER_PARTITIONS in the code, I couldn't find any place where we'd do that. I bumped up NUM_BUFFER_PARTITIONS to 128, but left MAX_SIMUL_LWLOCKS at 100, and did make check. It passed. Do a make check-world and it'll hopefully fail ;). Check pg_buffercache_pages.c. I'd actually quite like to have a pg_buffercache version that, at least optionally, doesn't do this, but that's a separate thing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/2014 11:30 AM, Dimitri Fontaine wrote: Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? +1 I really like the idea and would find it useful/time-saving 2. What do you think such a feature should look like? Elsewhere on this thread EXPLAIN was suggested. That makes a certain amount of sense. Maybe something like EXPLAIN IMPACT [...] 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Yes, I think it should cover all commands that can have an availability impact. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJULYhAAAoJEDfy90M199hllYgP/0Du599FAMtGh+Z9PsT+XRp9 eodurnf3TjbN8euh+/KGUDDy9dh8xiyeVCbLwT1a7tbJpY5ziGKQFrFm/5yXteq1 vU58mrvx3RwsuWJiTxVKUUddJgBd/e1Q1n7CS/rDHMWyHHxW9PfVi4c/V/09NB/p IZQP2lTiEJMZVRgemR53OokQarmrm08fN5HtaAbdwwA0y3q26lPWyx7y0DBiy1w2 2KMNQVxIHDYPby+HlDiJEwq8YxNEOuUcznfr2rICxX5iJxsoA13A04GwqDnzcPdL W3eg+P4qV7TriytpGD1GgqkyAzqTuQNaOBcGY7pvWBhBjQiDPA0fGuNw/a7MeOco 9JTJeCjOygoSopnMFMXyF7epjZxReZtr88uC8nZDXC8wwkJIVDzhNQefhT1lTA+a 1MTcBwgFBq1lH5ttdOTKjbqD7+uPp7nxaMhD9GNgCLu/NZeMNo1O4HMjv9Ir6AyQ etbkxcdOFuDaHmnrXnGOAFiM01JmorpVu6LBw4OjiD9KaO9X0gudHPo4LzocCxdB 6V2eTl95z/fKlG7uQOrNJ/S9y43FhFtgMZVsi0qIRqzu34ge7nxowjwyF9wcMZSq CKCEk4NlzULGsivPF96eMxxtebFgvYp10AvRvckGuf9s3dZBmqHfI6PPT1J3qPyj goq9yD/KpDfHLziqmZpr =6cWT -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inefficient barriers on solaris with sun cc
On 2014-10-02 11:35:32 -0400, Robert Haas wrote: On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote: Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. But, uh, we still don't seem to know what those guarantees actually ARE. Paired together they form a synchronized-with relationship. Problem #1 is that the standard's language isn't, to me at least, clear if there's not some case where that's not the case. Problem #2 is that our current README.barrier definition doesn't actually require barriers to be paired. Which imo is bad, but still a fact. I don't know what a synchronized-with relationship means. I'm using the standard's language here, given that I'm trying to reason about its behaviour... What it means is that if you have a matching pair of acquire/release operations or barriers/fences everything that happened *before* the last release fence will be visible *after* executing the next acquire operation in a different thread-of-execution. And 'after' is defined in the way that is true if the 'acquiring' thread can see the result of the 'releasing' operation. I.e. no loads after the acquire can see values from before the release. My problem with the definition in the standard is that it's not particularly clear how acquire fences *without* a underlying explicit atomic operation are defined in the standard. I checked gcc's current code and it's fine in that regard. Also other popular concurrent open source stuff like http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD does precisely what I'm talking about: 100 #ifndef smp_wmb 101 #ifdef __ATOMIC_RELEASE 102 #define smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE) 103 #else 104 #define smp_wmb() __sync_synchronize() 105 #endif 106 #endif 107 108 #ifndef smp_rmb 109 #ifdef __ATOMIC_ACQUIRE 110 #define smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE) 111 #else 112 #define smp_rmb() __sync_synchronize() 113 #endif 114 #endif The commit that added it http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88 was written by one gcc guy and reviewed by another one... So I think we can be pretty sure that gcc's __atomic_thread_fence() behaves like we want. We probably have to be a bit more careful about extending that definition (by including atomic.h and doing atomic_thread_fence(memory_order_acquire)) to use general C11. Which is probably a couple years away anyway. Also, I pretty much designed those definitions to match what Linux does. And it doesn't require that either, though it says that in most cases it will work out that way. My point is that that read barriers aren't particularly meaningful without a defined store order from another thread/process. Without any form of pairing you don't have that. The writing side could just have reordered the writes in a way you didn't want them. And the kernel docs do say A lack of appropriate pairing is almost certainly an error. But since read barriers also pair with lock releases operations, that's normally not a big problem. The definition of ACQ_REL is pretty clearly sufficient imo: Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.. I dunno. What's an acquire load? What's a release store? I know what loads and stores are; I don't know what the adjectives mean. An acquire load is either an explicit atomic load (tas, cmpxchg, etc also count) or a normal load combined with a acquire barrier. The symmetric definition is true for release store. (so, on x86 every load/store that prevents compiler reordering essentially a acquire/release store) And realistically, in the above example, you'd have to read flag to see that it's not already 1, right? Not necessarily. You could be the only writer. Think about the way the backend entries in the stats system work. The point of setting the flag may be for other people to know whether the data is in the middle of being modified. So you're thinking about something seqlock alike... Isn't the problem then that you actually don't want acquire semantics, but release or write barrier semantics on that store? The acquire/read barrier part would be on the reader side, no? I'm still unsure what you want to show with that example? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 10/02/2014 09:30 AM, Dimitri Fontaine wrote: Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? I would say it is late to the game and a great feature. 2. What do you think such a feature should look like? I liked the other post that said: EXPLAIN ALTER TABLE or whatever. Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE in case people want to run it on staging/test/dev environments to judge impact. 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? I would think that introducing this incrementally makes sense. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TAP test breakage on MacOS X
make check-world dies ingloriously for me, like this: /bin/sh ../../../config/install-sh -c -d tmp_check/log make -C ../../.. DESTDIR='/Users/rhaas/pgsql/src/bin/initdb'/tmp_check/install install '/Users/rhaas/pgsql/src/bin/initdb'/tmp_check/log/install.log 21 cd . TESTDIR='/Users/rhaas/pgsql/src/bin/initdb' PATH=/Users/rhaas/pgsql/src/bin/initdb/tmp_check/install/Users/rhaas/project/bin:$PATH DYLD_LIBRARY_PATH='/Users/rhaas/pgsql/src/bin/initdb/tmp_check/install/Users/rhaas/project/lib' PGPORT='65432' prove --ext='.pl' -I ../../../src/test/perl/ --verbose t/ t/001_initdb.pl .. 1..14 1..3 ok 1 - initdb --help exit code 0 ok 2 - initdb --help goes to stdout ok 3 - initdb --help nothing to stderr ok 1 - initdb --help 1..3 ok 1 - initdb --version exit code 0 ok 2 - initdb --version goes to stdout ok 3 - initdb --version nothing to stderr ok 2 - initdb --version 1..2 ok 1 - initdb with invalid option nonzero exit code ok 2 - initdb with invalid option prints error message # Looks like your test exited with 256 just after 2. not ok 3 - initdb options handling # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. ok 4 - basic initdb ok 5 - existing data directory ok 6 - nosync ok 7 - sync only ok 8 - sync missing data directory ok 9 - existing empty data directory ok 10 - separate xlog directory ok 11 - relative xlog directory not allowed ok 12 - existing empty xlog directory ok 13 - existing nonempty xlog directory ok 14 - select default dictionary # Looks like you failed 1 test of 14. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/14 subtests Test Summary Report --- t/001_initdb.pl (Wstat: 256 Tests: 14 Failed: 1) Failed test: 3 Non-zero exit status: 1 Files=1, Tests=14, 23 wallclock secs ( 0.02 usr 0.01 sys + 9.57 cusr 3.37 csys = 12.97 CPU) Result: FAIL make[2]: *** [check] Error 1 make[1]: *** [check-initdb-recurse] Error 2 make: *** [check-world-src/bin-recurse] Error 2 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Thu, Oct 2, 2014 at 1:07 PM, Andres Freund and...@2ndquadrant.com wrote: Do a make check-world and it'll hopefully fail ;). Check pg_buffercache_pages.c. Yep. Committed, with an update to the comments in lwlock.c to allude to the pg_buffercache issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proper query implementation for Postgresql driver
On Tue, Sep 30, 2014 at 1:20 AM, Craig Ringer cr...@2ndquadrant.com wrote: Frankly, I suggest dropping simple entirely and using only the parse/bind/describe/execute flow in the v3 protocol. The last time I checked, that was significantly slower. http://www.postgresql.org/message-id/ca+tgmoyjkfnmrtmhodwhnoj1jwcgzs_h1r70ercecrwjm65...@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? I definitely like the idea of such a 'dry-run' kind of operation to get an idea of what would happen. 2. What do you think such a feature should look like? My thinking is that this would be implemented as a new kind of read-only transaction type. 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? On the fence about this one.. In general, I'd say yes, but I've not looked at every case and I imagine there are DDL commands which really aren't all that interesting for this case. Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. That feels a bit ambitious, given that we've not yet really nailed down the feature definition yet, but I do like where you're going. :) Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] DDL Damage Assessment
* Harold Giménez (har...@heroku.com) wrote: I think the main issue is when a table rewrite is triggered on a DDL command on a large table, as this is what frequently leads to unavailability. The idea of introducing a NOREWRITE keyword to DDL commands then came up (credit: Peter Geoghegan). When the NOREWRITE keyword is used and the DDL statement would rewrite the table, the command errors and exits. This would allow ORM and framework authors to include the NOREWRITE option by default, only to be disabled on a per-statement basis by the developer, once they have assessed that it may be safe or otherwise they still want to proceed with this. The workflow for an app developer then becomes: * Write offending data migration (eg: add a column with a NOT NULL constraint and default value) * Test it locally, either by running automated test suite or running on staging * See that it fails because of NOREWRITE option * Assess situation. If it's a small table, or I still want to ignore, override the option. Or rewrite migration to avoid rewrite. * Repeat I like this a lot just because it's simple, limited in scope, and can be easily integrated into ORMs saving users hours of downtime and frustration. Thoughts? Not against it, but feels like an independent thing to consider- what Devrim is suggesting is broader and encompasses the issue of locks, which are certainly important to consider also. In short, seems like having both would be worthwhile. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DDL Damage Assessment
* Joshua D. Drake (j...@commandprompt.com) wrote: 2. What do you think such a feature should look like? I liked the other post that said: EXPLAIN ALTER TABLE or whatever. Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE in case people want to run it on staging/test/dev environments to judge impact. The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? I would think that introducing this incrementally makes sense. Agreed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] NEXT VALUE FOR sequence
On Thu, Oct 2, 2014 at 7:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: SQL:2003 introduced the function NEXT VALUE FOR sequence. Google tells me that at least DB2, SQL Server and a few niche databases understand it so far. As far as I can tell there is no standardised equivalent of currval and setval (but I only have access to second hand information about the standard, like articles and the manuals of other products). Here is a starter patch to add it. To avoid a shift/reduce conflict, I had to reclassify the keyword NEXT. I admit that I don't fully understand the consequences of that change! Please let me know if you think this could fly. Looks correct. Of course, it's annoying to have to reserve the NEXT keyword (as a type_func_name_keyword, not fully reserved). One way to avoid that is to collapse NEXT VALUE FOR into a single token in parser.c. We do that for a few other word pairs: NULLS FIRST, NULLS LAST, WITH TIME and WITH ORDINALITY. In this case you'd need to look-ahead three tokens, not two, but I guess that'd be doable. Those kinds of hacks are not scalable. It's not too bad right now because NULLS, FIRST, and LAST are all rarely-used keywords and there's rarely a reason for FIRST and LAST to follow NULLS except in the exact context we care about. But the more we extend those hacks, the more likely it is that the lexer will smash the tokens in some case where the user actually meant something else. Hacking the lexer to get around grammar conflicts doesn't actually fix whatever intrinsic semantic conflict exists; it just keeps bison from knowing about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? Yes. 2. What do you think such a feature should look like? As with others, I think EXPLAIN is a good way to do this without adding a keyword. So you'd do: EXPLAIN ALTER TABLE ... and it would produce a bunch of actions, available in either text or JSON formats. For example: { locks : [ { lock_type: relation, relation: table1, lock type: ACCESS EXCLUSIVE }, { lock_type: transaction }, { lock_type: catalog, catalogs: [pg_class, pg_attribute, pg_statistic], lock_type: EXCLUSIVE } ] } { writes : [ { object: relation files, action: rewrite }, { object: catalogs action: update } ] ... etc. Would need a lot of refinement, but you get the idea. 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Well, eventually we'd want to support all of them just to avoid having things be wierd for users. However, here's a priority order: ALTER TABLE CREATE TABLE DROP TABLE ALTER VIEW CREATE VIEW CREATE INDEX DROP INDEX ... since all of the above can have unexpected secondary effects on locking. For example, if you create a table with FKs it will take an ACCESS EXCLUSIVE lock on the FK targets. And if you DROP a partition, it takes an A.E. lock on the parent table. Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. Great! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Thu, Oct 2, 2014 at 12:54 AM, Peter Geoghegan p...@heroku.com wrote: I've started off by adding varied examples of the use of the existing proposed syntax. I'll expand on this soon. I spent some time today expanding on the details, and commenting on the issues around the custom syntax (exactly what it does, issues around ambiguous conflict-on unique indexes, etc). Also, Challenges, issues has been updated - I've added some secondary concerns. These are non-contentious items that I think are of concern, and would like to highlight now. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 10/02/2014 06:30 PM, Dimitri Fontaine wrote: Hi fellow hackers, [snip] Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? Yes, please 2. What do you think such a feature should look like? EXPLAIN [(verbose, format)] [DDL_COMMAND] as in: EXPLAIN (verbose on, format text, impact on) ALTER TABLE emp ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}'; where the output would include something like: ... EXCLUSIVE LOCK ON TABLE emp; // due to IMPACT ON REWRITE TABLE emp due to adding column foo2 (default='{}'::jsonb) // due to VERBOSE on ... 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? For completeness sake, yes. But, unless the impact and verbose modifiers are specified, most would be quite self-explanatory: EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp; Execution plan: - EXCLUSIVE LOCK ON TABLE emp; - truncate index: II (file=N) // = relfilenode - truncate main fork: N (tablespace: T)// = relfilenode - truncate visibility map - RELEASE LOCK ON TABLE emp; Summary: Z pages ( MMM MB ) would be freed versus a simple: EXPLAIN TRUNCATE TABLE emp; Execution plan: - truncate index: emp_pkey - truncate index: emp_foo2_idx - truncate relation emp Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. Sounds very good, indeed. Count on me as tester :) -- José Luis Tallón -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Tue, Sep 30, 2014 at 02:57:43PM -0700, Josh Berkus wrote: I don't know that that is the *expectation*. However, I personally would find it *acceptable* if it meant that we could get efficient merge semantics on other aspects of the syntax, since my primary use for MERGE is bulk loading. Regardless, I don't think there's any theoretical way to support UPSERT without a unique constraint. Therefore eventual support of this would require a full table lock. Therefore having it use the same command as UPSERT with a unique constraint is a bit of a booby trap for users. This is a lot like the ADD COLUMN with a default rewrites the whole table booby trap which hundreds of our users complain about every month. We don't want to add more such unexpected consequences for users. I think if we use the MERGE command for this feature we would need to use a non-standard keyword to specify that we want OLTP/UPSERT functionality. That would allow us to mostly use the MERGE standard syntax without having surprises about non-standard behavior. I am thinking of how CONCURRENTLY changes the behavior of some commands. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
EXPLAIN ALTER TABLE ? Good thing: People recognize it. Bad thing: People might not be able to tell the difference between a DDL and DML result. What about EXPLAIN DDL ...? The extra keyword (DDL) makes it a bit more explicit that the results are not comparable to the standard explain output. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lemb...@wrkhors.com +1 888 359 3508 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On Mon, 01 Sep 2014 12:00:48 +0200 Marko Tiikkaja ma...@joh.to wrote: create a new language. There are enough problems with SQL in general, enough alternatives proposed over time that it might be worth coming up with something that Just Works. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lemb...@wrkhors.com +1 888 359 3508 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
Python2 - Python3 would've been a lot less painful if you could mark, on a module-by-module basis, whether a module was python2 or python3 code. It wasn't very practical for Python because python code can reach deep into the guts of unrelated objects discovered at runtime - it can add/replace member functions, even hot-patch bytecode. That's not something we allow in PL/PgSQL, though; from the outside a PL/PgSQL function is pretty opaque to callers. Perl does this with use version. Currently this guarantees that the compiler is a minimum version and also turns OFF later version's keywords. At that point someone could turn on/off the appropriate syntax with by module or code block. If you never turn on v2.0 you never get the new behavior; after that people can adjust the amount and location of later code to their own taste. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lemb...@wrkhors.com +1 888 359 3508 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote: * Joshua D. Drake (j...@commandprompt.com) wrote: 2. What do you think such a feature should look like? I liked the other post that said: EXPLAIN ALTER TABLE or whatever. Heck it could even be useful to have EXPLAIN ANALZYE ALTER TABLE in case people want to run it on staging/test/dev environments to judge impact. The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, so getting the local view of changes will be extremely difficult unless you limit the scope considerably. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost sfr...@snowman.net wrote: The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. It's kind of tricky to implement a patch to figure this out ahead of time. Some of the actual lock acquisitions are well hidden, in terms of how the code is structured. In others cases, it may not even be possible to determine ahead of time exactly what locks will be taken. As Harold mentioned, another idea along the same lines would be to decorate DDL with a NOWAIT no locking assertion and/or no rewrite assertion. Basically, if this DDL (or perhaps any DDL, if this is implemented as a GUC instead) necessitates a table rewrite (and requires an AccessExclusiveLock), throw an error. That's the case that most people care about. This may not even be good enough, though. Consider: Session 1 is a long running transaction. Maybe it's a spurious idle-in-transaction situation, but it could also be totally reasonable. It holds an AccessShareLock on some relation, as long running transactions are inclined to do. Session 2 is our migration. It needs an AccessExclusiveLock to ALTER TABLE on the same relation (or whatever). But it doesn't need a rewrite, which is good. It comes along and attempts to acquire the lock, blocking on session 1. Session 3 is an innocent bystander. It goes to query the same table in an ordinary, routine way - a SELECT statement. Even though session 2's lock is not granted yet, session 3 is not at liberty to skip the queue and get its own AccessShareLock. The effect is about the same as if session 2 did need to hold an AccessExclusiveLock for ages: read queries block for a long time. And yet, in theory session 2's impact on production should not be minimal, if we consider something like EXPLAIN output. Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a particularly bad case. NOWAIT might be the wrong thing for DDL generally. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 1:37 PM, Peter Geoghegan p...@heroku.com wrote: And yet, in theory session 2's impact on production should not be minimal, if we consider something like EXPLAIN output. Should have been minimal, I mean. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 5:37 PM, Peter Geoghegan p...@heroku.com wrote: Session 3 is an innocent bystander. It goes to query the same table in an ordinary, routine way - a SELECT statement. Even though session 2's lock is not granted yet, session 3 is not at liberty to skip the queue and get its own AccessShareLock. The effect is about the same as if session 2 did need to hold an AccessExclusiveLock for ages: read queries block for a long time. And yet, in theory session 2's impact on production should not be minimal, if we consider something like EXPLAIN output. The explain would show the AccessExclusiveLock, so it would be enough for a heads-up to kill all idle-in-transaction holding locks on the target relation (if killable, or just wait). Granted, it's something that's not easily automatable, whereas a nowait is. However, rather than nowait, I'd prefer cancellable semantics, that would cancel voluntarily if any other transaction requests a conflicting lock, like autovacuum does. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 6:00 PM, Peter Geoghegan p...@heroku.com wrote: Granted, it's something that's not easily automatable, whereas a nowait is. However, rather than nowait, I'd prefer cancellable semantics, that would cancel voluntarily if any other transaction requests a conflicting lock, like autovacuum does. I think the problem you'll have with NOWAIT is: you have an error from having to wait...what now? Do you restart? I imagine this would frequently result in what is effectively lock starvation. Any old AccessShareLock-er is going to make our migration tool restart. We'll never finish. I've done that manually (throw the DDL, and cancel if it takes more than a couple of seconds) on modest but relatively busy servers with quite some success. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
* Claudio Freire (klaussfre...@gmail.com) wrote: On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote: The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, Why? If you know the transaction is going to roll back and you only add entries to the catalog which aren't visible to any other transactions than your own, and you make sure that nothing you do actually writes data out which is visible to other transactions.. so getting the local view of changes will be extremely difficult unless you limit the scope considerably. I agree that there may be complexities, but I'm not sure this is really the issue.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 1:52 PM, Claudio Freire klaussfre...@gmail.com wrote: The explain would show the AccessExclusiveLock, so it would be enough for a heads-up to kill all idle-in-transaction holding locks on the target relation (if killable, or just wait). I think that there are very few problems with recognizing when an AccessExclusiveLock is needed or not needed. The exceptions to the rule that DDL needs such a lock are narrow enough that I have a hard time believing that most people think about it, or even need to think about it. I wish that wasn't the case, but it is. Granted, it's something that's not easily automatable, whereas a nowait is. However, rather than nowait, I'd prefer cancellable semantics, that would cancel voluntarily if any other transaction requests a conflicting lock, like autovacuum does. I think the problem you'll have with NOWAIT is: you have an error from having to wait...what now? Do you restart? I imagine this would frequently result in what is effectively lock starvation. Any old AccessShareLock-er is going to make our migration tool restart. We'll never finish. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote: The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. And that needs locks, especially if you modify the underlying filesystem layout. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, Why? If you know the transaction is going to roll back and you only add entries to the catalog which aren't visible to any other transactions than your own, and you make sure that nothing you do actually writes data out which is visible to other transactions.. But that's not the scope. If you want a dry-run of table-rewriting DDL, or DDL interspersed with DML like: alter table blargh add foo integer; update blargh set foo = coalesce(bar, baz); You really cannot hope not to have to write data. The above is also the case with defaulted columns btw. so getting the local view of changes will be extremely difficult unless you limit the scope considerably. I agree that there may be complexities, but I'm not sure this is really the issue.. In essence, if you want MVCC catalog access without AEL, you're in for a rough ride. I'm not as experienced with pg's core as you, so you tell me, but I imagine it will be the case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian br...@momjian.us wrote: I think if we use the MERGE command for this feature we would need to use a non-standard keyword to specify that we want OLTP/UPSERT functionality. That would allow us to mostly use the MERGE standard syntax without having surprises about non-standard behavior. I am thinking of how CONCURRENTLY changes the behavior of some commands. That would leave you without a real general syntax. It'd also make having certain aspects of an UPSERT more explicit be a harder goal (there is no conventional join involved here - everything goes through a unique index). Adding the magic keyword would break certain other parts of the statement, so you'd have exact rules for what worked where. I see no advantage, and considerable disadvantages. Note that I've documented a lot of this stuff here: https://wiki.postgresql.org/wiki/UPSERT Mapping the join thing onto which unique index you want to make the UPSERT target is very messy. There are a lot of corner cases. It's quite ticklish. Please add to it if you think we've missed something. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 10/2/14 3:19 PM, Robert Haas wrote: 1..2 ok 1 - initdb with invalid option nonzero exit code ok 2 - initdb with invalid option prints error message # Looks like your test exited with 256 just after 2. not ok 3 - initdb options handling # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. Is this repeatable? Bisectable? Has it ever worked? Have you tried a clean build? Did you upgrade something in your operating system? It appears to work everywhere else. If none of this gets us closer to an answer, I can try to produce a patch that produces more details for such failures. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 2:04 PM, Claudio Freire klaussfre...@gmail.com wrote: I've done that manually (throw the DDL, and cancel if it takes more than a couple of seconds) on modest but relatively busy servers with quite some success. Fair enough, but that isn't the same as NOWAIT. It's something we'd have a hard time coming up with a general-purpose timeout for. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-02 17:09:43 -0400, Peter Eisentraut wrote: On 10/2/14 3:19 PM, Robert Haas wrote: 1..2 ok 1 - initdb with invalid option nonzero exit code ok 2 - initdb with invalid option prints error message # Looks like your test exited with 256 just after 2. not ok 3 - initdb options handling # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. Is this repeatable? Bisectable? Has it ever worked? Have you tried a clean build? Did you upgrade something in your operating system? It appears to work everywhere else. If none of this gets us closer to an answer, I can try to produce a patch that produces more details for such failures. FWIW, the current amount of details on errors is clearly insufficient. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
* Peter Geoghegan (p...@heroku.com) wrote: On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost sfr...@snowman.net wrote: The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. It's kind of tricky to implement a patch to figure this out ahead of time. Some of the actual lock acquisitions are well hidden, in terms of how the code is structured. In others cases, it may not even be possible to determine ahead of time exactly what locks will be taken. I was thinking this would be a new kind of transaction and we'd have to teach parts of the system about it- yes, that's pretty invasive, but it's at least one approach to consider. As Harold mentioned, another idea along the same lines would be to decorate DDL with a NOWAIT no locking assertion and/or no rewrite assertion. Basically, if this DDL (or perhaps any DDL, if this is implemented as a GUC instead) necessitates a table rewrite (and requires an AccessExclusiveLock), throw an error. That's the case that most people care about. The problem I see with this approach is outlined above. I agree that it may be independently valuable, but I don't see it as being a solution to the issue. This may not even be good enough, though. Consider: Session 1 is a long running transaction. Maybe it's a spurious idle-in-transaction situation, but it could also be totally reasonable. It holds an AccessShareLock on some relation, as long running transactions are inclined to do. Session 2 is our migration. It needs an AccessExclusiveLock to ALTER TABLE on the same relation (or whatever). But it doesn't need a rewrite, which is good. It comes along and attempts to acquire the lock, blocking on session 1. Session 3 is an innocent bystander. It goes to query the same table in an ordinary, routine way - a SELECT statement. Even though session 2's lock is not granted yet, session 3 is not at liberty to skip the queue and get its own AccessShareLock. The effect is about the same as if session 2 did need to hold an AccessExclusiveLock for ages: read queries block for a long time. And yet, in theory session 2's impact on production should not be minimal, if we consider something like EXPLAIN output. Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a particularly bad case. NOWAIT might be the wrong thing for DDL generally. I agree that this is a concern, but this feels to me like a next-step over top of the assess the locks required transaction type which I am trying to outline. Specifically, having a way to take the report of what locks are going to be required and then actaully attempt to acquire them all (or fail if any can't be granted immediately) would be a natural next step and a way to start off the actual migration script- either all get acquired and the script runs to completion, or a lock isn't granted and the whole thing fails immediately without anything actually being done. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DDL Damage Assessment
On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost sfr...@snowman.net wrote: The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local view of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain whatever' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. So you want to modify the catalog without actually doing the corresponding actions? That'll be heck of invasive. With changes all over the backend. We'll need to remove error checks (like for the existance of relfilenodes), remove rewriting, and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 2014-10-02 13:49:36 -0300, Claudio Freire wrote: EXPLAIN ALTER TABLE ? I don't think that'll work - there's already EXPLAIN for some CREATE. At least CREATE TABLE ... AS, CREATE VIEW ... AS and SELECT INTO. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
* Claudio Freire (klaussfre...@gmail.com) wrote: On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote: That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. And that needs locks, especially if you modify the underlying filesystem layout. And we wouldn't be doing that, certainly. It's a dry-run. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, Why? If you know the transaction is going to roll back and you only add entries to the catalog which aren't visible to any other transactions than your own, and you make sure that nothing you do actually writes data out which is visible to other transactions.. But that's not the scope. If you want a dry-run of table-rewriting DDL, or DDL interspersed with DML like: alter table blargh add foo integer; update blargh set foo = coalesce(bar, baz); You really cannot hope not to have to write data. The above is also the case with defaulted columns btw. The point is to not write anything which is visible to other transactions, which means we'd have to put DML into some different 'mode' which doesn't actually write where other processes might be looking. I'm not saying it's trivial to do, but I don't think it's impossible either. We might also be able to simply get away with short-circuiting them and not actually writing anything (and the same for reading data..). What would probably be useful is to review actual migration scripts and see if this would really work. I know they'd work for at least a subset of the migration scripts that I've dealt with before, and also not all of them. so getting the local view of changes will be extremely difficult unless you limit the scope considerably. I agree that there may be complexities, but I'm not sure this is really the issue.. In essence, if you want MVCC catalog access without AEL, you're in for a rough ride. I'm not as experienced with pg's core as you, so you tell me, but I imagine it will be the case. It's not clear to me what you're getting at as the 'rough' part, exactly.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DDL Damage Assessment
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. So you want to modify the catalog without actually doing the corresponding actions? That'll be heck of invasive. With changes all over the backend. We'll need to remove error checks (like for the existance of relfilenodes), remove rewriting, and such. Yeah, I was getting at it being rather invasive earlier. It really depends on exactly what we'd support in this mode, which would depend on just what would be invasive and what wouldn't, I expect. I dislike the idea of not being able to actually run a real migration script though as anything else opens the very real possibility that the real script and the 'explain' script don't do the same thing, making this capability not nearly as useful.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Assertion failure in syncrep.c
On 18 September 2014 07:32, Pavan Deolasee pavan.deola...@gmail.com wrote: 564 /* 565 * Set state to complete; see SyncRepWaitForLSN() for discussion of 566 * the various states. 567 */ 568 thisproc-syncRepState = SYNC_REP_WAIT_COMPLETE; 569 570 /* 571 * Remove thisproc from queue. 572 */ 573 SHMQueueDelete((thisproc-syncRepLinks)); Yes, looks like a bugette to me also. Unlikely to occur except when no network involved, and even more luckily no effect at all when Assert is not enabled. So not a priority fix. But fix looks trivial, just to switch around the two statements above. I'll backpatch tomorrow. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Oct 2, 2014 at 02:08:30PM -0700, Peter Geoghegan wrote: On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian br...@momjian.us wrote: I think if we use the MERGE command for this feature we would need to use a non-standard keyword to specify that we want OLTP/UPSERT functionality. That would allow us to mostly use the MERGE standard syntax without having surprises about non-standard behavior. I am thinking of how CONCURRENTLY changes the behavior of some commands. That would leave you without a real general syntax. It'd also make having certain aspects of an UPSERT more explicit be a harder goal (there is no conventional join involved here - everything goes through a unique index). Adding the magic keyword would break certain other parts of the statement, so you'd have exact rules for what worked where. I see no advantage, and considerable disadvantages. Note that I've documented a lot of this stuff here: https://wiki.postgresql.org/wiki/UPSERT Mapping the join thing onto which unique index you want to make the UPSERT target is very messy. There are a lot of corner cases. It's quite ticklish. Please add to it if you think we've missed something. OK, it is was just an idea I wanted to point out, and if it doesn't work, it more clearly cements that we need UPSERT _and_ MERGE. Josh was pointing out that we don't want to surprise our users, so I suggested an additional keyword, which addresses his objections, but as you said, if that standard MERGE syntax doesn't give us what we want, then that is the fatal objection to using only MERGE. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost sfr...@snowman.net wrote: That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. And that needs locks, especially if you modify the underlying filesystem layout. And we wouldn't be doing that, certainly. It's a dry-run. ... (...) We might also be able to simply get away with short-circuiting them and not actually writing anything (and the same for reading data..). What would probably be useful is to review actual migration scripts and see if this would really work. I know they'd work for at least a subset of the migration scripts that I've dealt with before, and also not all of them. I believe, for it to be reasonably intrusive, you'd have to abort at the first need to actually read/write data. Most of my migration scripts, especially the ones that would benefit from this, require some of that, but that's just my personal common practice, not a general case. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, Why? If you know the transaction is going to roll back and you only add entries to the catalog which aren't visible to any other transactions than your own, and you make sure that nothing you do actually writes data out which is visible to other transactions.. But that's not the scope. If you want a dry-run of table-rewriting DDL, or DDL interspersed with DML like: alter table blargh add foo integer; update blargh set foo = coalesce(bar, baz); You really cannot hope not to have to write data. The above is also the case with defaulted columns btw. The point is to not write anything which is visible to other transactions, which means we'd have to put DML into some different 'mode' which doesn't actually write where other processes might be looking. I'm not saying it's trivial to do, but I don't think it's impossible either. (...) No, I don't think it's impossible either. Just very, very time-consuming. Both in developing the patch and in its eventual maintenance. TBH, a separate read-only transaction like explain alter would also be quite difficult to keep in sync with actual alter logic, unless it's handled by the same code (unlikely in that form). so getting the local view of changes will be extremely difficult unless you limit the scope considerably. I agree that there may be complexities, but I'm not sure this is really the issue.. In essence, if you want MVCC catalog access without AEL, you're in for a rough ride. I'm not as experienced with pg's core as you, so you tell me, but I imagine it will be the case. It's not clear to me what you're getting at as the 'rough' part, exactly.. A lot of work, touching most of the codebase, and hard to maintain in the end. Unless you limit the scope, as you said, just touching the catalog, not data, could be doable. But it would act as an implicit norewrite. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
Alvaro Herrera alvhe...@2ndquadrant.com writes: - will the table have to be rewritten? the indexes? Please give my DDL deparsing patch a look. There is a portion there about deparsing ALTER TABLE specifically; what it does is save a list of subcommands, and for each of them we either report the OID of the object affected (for example in ADD CONSTRAINT), or a column number (for ALTER COLUMN RENAME, say). It sounds like you would like to have some extra details returned: for instance the does the whole of it require a table rewrite bit. It sounds like it can be trivially returned in the JSON Some years ago when working on the Event Trigger framework we did mention providing some interesting events, such as a TableRewrite Event. In between what you're saying here and what Harold and Peter Geoghegan are mentionning (basically that dealing with table rewrites is 90% of the need for them), it could be that the best way to have at it would be to add that Event in the Event Trigger mechanism. We could also add an AccessExclusiveLock Event that would fire just before actually taking the lock, allowing people to RAISE EXCEPTION in that case, or to maybe just do the LOCK … NOWAIT themselves in the trigger. For the locking parts, best would be to do the LOCK … NOWAIT dance for all the tables touched by the DDL migration script. The Event Trigger approach will not solve that, unfortunately. Regards, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 10/02/2014 01:15 PM, Joe Conway wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/2014 11:30 AM, Dimitri Fontaine wrote: Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? +1 I really like the idea and would find it useful/time-saving 2. What do you think such a feature should look like? Elsewhere on this thread EXPLAIN was suggested. That makes a certain amount of sense. Maybe something like EXPLAIN IMPACT [...] 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only taking locks in their own pg_catalog entry anyway? Yes, I think it should cover all commands that can have an availability impact. In principle I agree with the sentiment. However, that full coverage is a nice goal, seldom achieved. The real question is at what level of information, returned to the user, does this feature become user friendly? It is one thing to provide information of the kind of TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE foo FOR FK (a, b) IN bar (id1, id2) That information is useful, but only to an experienced DBA who knows their schema and data to a certain degree. The majority of users, I fear, will not be able to even remotely guesstimate if that will need seconds or hours. There needs to be more detail information for those cases and I believe that tackling them one at a time in depth will lead to more useful results than trying to cover a lot but shallow. My $.02 Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for updating src/timezone
John Cochran j69coch...@gmail.com writes: As it is, I've finished checking the differences between the postgres and IANA code for zic.c after editing both to eliminate non-functional style differences such as indentation, function prototypes, comparing strchr results against NULL or 0, etc. It looks like the only differences from the stock code is support for the -P option implemented by Tom Lane and the substitution of the postgres code for getopt instead of the unix default as well as a few minor changes in some defaults and minor modifications to deal with Windows. Overall, rather small and trivial. Additionally, I discovered a little piece of low hanging fruit. The Makefile for the timezone code links together zic.o ialloc.o scheck.o and localtime.o in order to make zic. Additionally, zic.c has a stub implementation of pg_open_tzfile() in order to resolve a linkage issue with the localtime.o module for zic. Well, as it turns out, localtime.o doesn't supply ANY symbols that zic needs and therefore can be omitted entirely from the list of object files comprising zic. Which in turn means that the stub implementation of pg_open_tzfile can be removed from the postgres version of zic.c. I'm not bothering to submit a patch involving this since that patch will be quite short lived given my objective to bring the code up to date with the 2014e version of the IANA code. But I am submitting a bug report to IANA on the Makefile since the unneeded linkage with localtime.o is still in the 2014e code on their site. John, have you made any further progress on this since July? The urgency of updating our timezone code has risen quite a bit for me, because while testing an update of the data files to tzdata2014h I became aware that the -P option is failing to print a noticeable number of zone abbreviations that clearly exist in the data files. Since the -P hack itself is so simple, it's hard to come to any other conclusion than that the rest of the timezone code is buggy --- presumably because it's four years behind what IANA is targeting with the data files. I spent a little bit of time looking at just applying the diffs between tzcode2010c and tzcode2014h to our code, but the diffs are large enough that that seems both tedious and quite error-prone. I think we'd be better advised to push forward with the idea of trying to absorb more-or-less-unmodified versions of the timezone code files. (I'm particularly disillusioned with the idea that we'll keep on pgindent'ing that code --- the effort-to-reward ratio even to keep the comments readable just ain't very good.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NEXT VALUE FOR sequence
On 3 October 2014 00:01, Thomas Munro mu...@ip9.org wrote: On 2 October 2014 14:48, Tom Lane t...@sss.pgh.pa.us wrote: Thomas Munro mu...@ip9.org writes: SQL:2003 introduced the function NEXT VALUE FOR sequence. Google tells me that at least DB2, SQL Server and a few niche databases understand it so far. As far as I can tell there is no standardised equivalent of currval and setval (but I only have access to second hand information about the standard, like articles and the manuals of other products). Have you checked the archives about this? My recollection is that one reason it's not in there (aside from having to reserve NEXT) is that the standard-mandated semantics are not the same as nextval(). Right, I found the problem: If there are multiple instances of next value expressions specifying the same sequence generator within a single SQL-statement, all those instances return the same value for a given row processed by that SQL-statement. This was discussed in a thread from 2002 [1]. So the first step would be to make a standard conforming function to transform the standard's syntax into. I found the text in the 20nn draft specification and it didn't seem immediately clear what 'statement' should mean, for example what if your statement calls pl/pgsql which contains further statements, and what if triggers, default expressions, etc are invoked? I suppose one approach would be to use command IDs as the scope. Do you think the following change would make sense? In struct SeqTableData (from sequence.c), add a member last_command_id. When you call the new function, let's say nextval_for_command(regclass), if last_command_id matches GetCommandId() then it behaves like currval_oid and returns last, otherwise it behaves like nextval_oid, and updates last_command_id to the current command ID. Actually scratch that, it's not about statements, it's about rows processed by that SQL-statement. I will think about how that could be interpreted and implemented... Best regards, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Wed, Oct 1, 2014 at 2:42 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: So, what's the correct/best grammar? CREATE [ IF NOT EXISTS ] [ UNIQUE ] INDEX index_name or CREATE [ UNIQUE ] INDEX [ IF NOT EXISTS ] index_name I've elected myself as the reviewer for this patch. Here are some preliminary comments... I agree with José. The 2nd is more consistent given the other syntaxes: CREATE { TABLE | SCHEMA | EXTENSION | ... } IF NOT EXISTS name ... It's also compatible with SQLite's grammar: https://www.sqlite.org/lang_createindex.html Do we want to enforce an order on the keywords or allow both? CREATE INDEX IF NOT EXISTS CONCURRENTLY foo ... CREATE INDEX CONCURRENTLY IF NOT EXISTS foo ... It's probably very rare to use both keywords at the same time, so I'd prefer only the 2nd, unless someone else chimes in. Documentation: I would prefer if the explanation were consistent with the description for ALTER TABLE/EXTENSION; just copy it and replace relation with index. + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg(relation \%s\ already exists, skipping, + indexRelationName))); 1. Clearly relation should be index. 2. Use ERRCODE_DUPLICATE_OBJECT not TABLE + if (n-if_not_exists n-idxname == NULL) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(IF NOT EXISTS requires that you name the index.), I think ERRCODE_SYNTAX_ERROR makes more sense, it's something that we decided we *don't want* to support. - write_msg(NULL, reading row-security enabled for table \%s\, + write_msg(NULL, reading row-security enabled for table \%s\\n, ??? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NEXT VALUE FOR sequence
Thomas Munro mu...@ip9.org writes: On 2 October 2014 14:48, Tom Lane t...@sss.pgh.pa.us wrote: Have you checked the archives about this? My recollection is that one reason it's not in there (aside from having to reserve NEXT) is that the standard-mandated semantics are not the same as nextval(). Right, I found the problem: If there are multiple instances of next value expressions specifying the same sequence generator within a single SQL-statement, all those instances return the same value for a given row processed by that SQL-statement. This was discussed in a thread from 2002 [1]. Wow, it was that far back? No wonder I didn't remember the details. I suppose one approach would be to use command IDs as the scope. The spec clearly says one value per row, not one per statement; so command ID is very definitely not the right thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/2014 06:43 PM, Jan Wieck wrote: The real question is at what level of information, returned to the user, does this feature become user friendly? It is one thing to provide information of the kind of TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE foo FOR FK (a, b) IN bar (id1, id2) That information is useful, but only to an experienced DBA who knows their schema and data to a certain degree. The majority of users, I fear, will not be able to even remotely guesstimate if that will need seconds or hours. There needs to be more detail information for those cases and I believe that tackling them one at a time in depth will lead to more useful results than trying to cover a lot but shallow. Perhaps. This and other posts on this thread make me wonder if some kind of extension to the stats collector and pg_stat_statement might be a good place to start? Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIbBAEBAgAGBQJULeHwAAoJEDfy90M199hlYzQP9jJU6hs21zBZWCUUhN1159U2 Gjs8IfxxTClCUFI9Do+PyoXD2qSJKb1b6y9A6n6YXIJEzZEZfrNuuH3HvcTPMDhs yhDyAt72dyAU/udDZ2IRORQe61SBycFyi9srTyfRAf/gmUEjSdLXoZZ9JkZHhbjM 65cOmX9icRnwArPwu+FRsSilutfkW7D38s0Fao2uC+zL10acm5xkC+sTRaDJQXZm MitNsBu2IowYaEGkbNwvzV4lemCySvIyac6YzinBBpOEU32kM7gqjeFx4KLKZAfW g5/7e0DeuUYw6Y4+ghb+JIdiuPqV8FJdcV+L3z7nFs7QVh4F7IictJTJ/pxdYUzc VtfGzDAtiXIV7g0TEctoH9T2yiRe2ZEllV0yy7rgXn2Uj4mgMQDgz9QNsBPbcOdz KZ3Dey2NvoACAXiDwwzE/QKDZu1Siqfe5MnlAFPhWEOm29PiT2GF/Y7Cj6C6D84N VL+/Y4G9BuPokYQuVMyY7gP5lRPoBqBafiACki/8kIIM5pSAqen19VzSCEZTVRmd 471TqczxG3ibsfyRWEgUxW5zhnR+Se5z6FqIJPvpVbhe3OcpUhk4eQbBrSK3AnyN Vq0lJ1yWcXzbRxULeFpMmXBbv/ZC/qfLc0tRmMI0VJXYvIXJOp3p2HvKcIK6v/hh 9hil/uPVIHBagyEYZco= =0lXI -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Thu, Oct 2, 2014 at 5:08 PM, Greg Smith greg.sm...@crunchydatasolutions.com wrote: When 9.4 is already giving a more than 100% gain on this targeted test case, I can't see that chasing after maybe an extra 10% is worth having yet another GUC around. Especially when it will probably take multiple tuning steps before you're done anyway; we don't really know the rest of them yet; and when we do, we probably won't need a GUC to cope with them in the end anyway. Agreed. I think that prior to 9.4, the logging performance of Postgres was very competitive when compared to other systems. At this stage, it's probably extremely fast by any standard. Amit's work on only WAL-logging the modified portion of UPDATEs helps here too. I tend to believe that the next big round of performance gains can be had by working on the buffer manager, and B-Tree indexes. At some point we should work on prefix compression within B-Tree leaf pages. We should also work on adding abbreviated keys to B-Tree internal pages. Doing so should almost remove the benefit of using the C locale, because most comparisons needed for index scans can use comparisons implemented as nothing more than a memcmp() (note that internal pages have values that are naturally heterogeneous, so this will work well). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Alvaro Herrera wrote: Basically, if you are on 9.3.5 or earlier any per-table options for autovacuum cost delay will misbehave (meaning: any such table will be processed with settings flattened according to balancing of the standard options, _not_ the configured ones). If you are on 9.3.6 or newer they will behave as described in the docs. Another thing to note is that if you have configured a table to have cost_limit *less* than the default (say 150 instead of the default 200), the balance system will again break that and process the table at 200 instead; in other words, the balancing system has completely broken the ability to tweak the cost system for individual tables in autovacuum. That's certainly pretty ugly. With the v5 patch, the example tables above will be vacuumed at exactly 5000 and 150 instead. The more complex patch I produced earlier would have them vacuumed at something like 4900 and 100 instead, so you wouldn't exceed the total of 5000. I think there is some value to that idea, but it seems the complexity of managing this is too high. Agreed. I am rather surprised that nobody has reported this problem before. I am now of the mind that this is clearly a bug that should be fixed all the way back. I'm coming around to that also, however, should we worry about users who set per-table settings and then simply forgot about them? I suppose that won't matter too much unless the table is really active, and if it is, they've probably already set it to zero. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements
On Tue, Aug 26, 2014 at 4:20 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04/14/2014 10:31 PM, Fabrízio de Royes Mello wrote: The attached patch contains CINE for sequences. I just strip this code from the patch rejected before. Committed with minor changes Hmm, the CommitFest app lists Marko Tiikkaja as the reviewer, but I can't find his review anywhere... The documentation claims: CREATE [ IF NOT EXISTS ] SEQUENCE name But grammar implements it the other way around: CREATE SEQUENCE IF NOT EXISTS name; Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Fri, Oct 3, 2014 at 2:15 AM, Marti Raudsepp ma...@juffo.org wrote: + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg(relation \%s\ already exists, skipping, + indexRelationName))); 1. Clearly relation should be index. 2. Use ERRCODE_DUPLICATE_OBJECT not TABLE My bad, this code is OK. The current code already uses relation and TABLE elsewhere because indexes share the same namespace with tables. + /* + * Throw an exception when IF NOT EXISTS is used without a named + * index + */ I'd say without an index name. And the line goes beyond 80 characters wide. I would also move this check to after all the attributes have been assigned, rather than splitting the assignments in half. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On Thu, Oct 2, 2014 at 5:09 PM, Peter Eisentraut pete...@gmx.net wrote: On 10/2/14 3:19 PM, Robert Haas wrote: 1..2 ok 1 - initdb with invalid option nonzero exit code ok 2 - initdb with invalid option prints error message # Looks like your test exited with 256 just after 2. not ok 3 - initdb options handling # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. Is this repeatable? Bisectable? Has it ever worked? Have you tried a clean build? Did you upgrade something in your operating system? I don't think it's ever worked. I just didn't get around to reporting it before now. If none of this gets us closer to an answer, I can try to produce a patch that produces more details for such failures. A test that fails for no reason that can be gleaned from the output is not an improvement over not having a test at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On Thu, Oct 2, 2014 at 12:44 AM, Andres Freund and...@2ndquadrant.com wrote: I pushed the first part. Thanks. Attached is a rebased version of patch 2, implementing the actual feature. One thing I noticed with more testing is that if --create is used and that the destination folder does not exist, pg_receivexlog was creating the slot, and left with an error. This does not look user-friendly so I changed the logic a bit to check for the destination folder before creating any slot. This results in a bit of refactoring, but this way behavior is more intuitive. Regards, -- Michael From e3ee4b918b8122d414c03207fdd50f2472dc292e Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 1 Sep 2014 20:53:45 +0900 Subject: [PATCH] Support for replslot creation and drop in pg_receivexlog Using the new actions --create and --drop that are similarly present in pg_recvlogical, a user can respectively create and drop a replication slot that can be used afterwards when fetching WALs. --- doc/src/sgml/ref/pg_receivexlog.sgml | 29 +++ src/bin/pg_basebackup/pg_receivexlog.c | 154 + 2 files changed, 168 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index 5916b8f..69aa0af 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -72,6 +72,35 @@ PostgreSQL documentation titleOptions/title para +applicationpg_receivexlog/application can run in one of two following +modes, which control physical replication slot: + +variablelist + + varlistentry + termoption--create/option/term + listitem + para +Create a new physical replication slot with the name specified in +option--slot/option. + /para + /listitem + /varlistentry + + varlistentry + termoption--drop/option/term + listitem + para +Drop the replication slot with the name specified in +option--slot/option, then exit. + /para + /listitem + /varlistentry +/variablelist + + /para + + para The following command-line options control the location and format of the output. diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 171cf43..8c4f752 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -38,11 +38,15 @@ static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ static int fsync_interval = 0; /* 0 = default */ static volatile bool time_to_abort = false; +static bool do_create_slot = false; +static bool do_drop_slot = false; static void usage(void); +static DIR* get_destination_dir(char *dest_folder); +static void close_destination_dir(DIR *dest_dir, char *dest_folder); static XLogRecPtr FindStreamingStart(uint32 *tli); -static void StreamLog(); +static void StreamLog(void); static bool stop_streaming(XLogRecPtr segendpos, uint32 timeline, bool segment_finished); @@ -78,6 +82,9 @@ usage(void) printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); printf(_( -S, --slot=SLOTNAMEreplication slot to use\n)); + printf(_(\nOptional actions:\n)); + printf(_( --create create a new replication slot (for the slot's name see --slot)\n)); + printf(_( --drop drop the replication slot (for the slot's name see --slot)\n)); printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n)); } @@ -118,6 +125,44 @@ stop_streaming(XLogRecPtr xlogpos, uint32 timeline, bool segment_finished) return false; } + +/* + * Get destination directory. + */ +static DIR* +get_destination_dir(char *dest_folder) +{ + DIR *dir; + + Assert(dest_folder != NULL); + dir = opendir(dest_folder); + if (dir == NULL) + { + fprintf(stderr, _(%s: could not open directory \%s\: %s\n), +progname, basedir, strerror(errno)); + disconnect_and_exit(1); + } + + return dir; +} + + +/* + * Close existing directory. + */ +static void +close_destination_dir(DIR *dest_dir, char *dest_folder) +{ + Assert(dest_dir != NULL dest_folder != NULL); + if (closedir(dest_dir)) + { + fprintf(stderr, _(%s: could not close directory \%s\: %s\n), +progname, dest_folder, strerror(errno)); + disconnect_and_exit(1); + } +} + + /* * Determine starting location for streaming, based on any existing xlog * segments in the directory. We start at the end of the last one that is @@ -134,13 +179,7 @@ FindStreamingStart(uint32 *tli) uint32 high_tli = 0; bool high_ispartial = false; - dir = opendir(basedir); - if (dir == NULL) - { - fprintf(stderr, _(%s: could not open directory \%s\: %s\n), -progname, basedir, strerror(errno)); - disconnect_and_exit(1); - } + dir = get_destination_dir(basedir); while (errno = 0, (dirent =
[HACKERS] GiST splitting on empty pages
This is from Bug #11555, which is still in moderation as I type this (analysis was done via IRC). The GiST insertion code appears to have no length checks at all on the inserted entry. index_form_tuple checks for length = 8191, with the default blocksize, but obviously a tuple less than 8191 bytes may still not fit on the page due to page header info etc. However, the gist insertion code assumes that if the tuple doesn't fit, then it must have to split the page, with no check whether the page is already empty. So this crashes with infinite recursion in gistSplit (which also lacks a check_stack_depth() call): create extension pg_trgm; create table t1 (a text); create index on t1 using gist (a gist_trgm_ops); insert into t1 values ( 'vvsehbxxlezgwtyvbgdyburmhxmuzbwvwoaepxbbbaiyuguwnxvprmbzkotkqqfnegruds' 'wftedolykzvfonsqndehouxuibazwdrtjlynzjlkihqxvjnimrpbmnvupvtlzlejxdwwmh' 'hvpxtkggstyivlvqgkmawmlbvjerfrzmnokgyrnrllagwwxdgjddwofrxjidbiqowbvusi' 'mdumkrihuprxsnmyekhnojvexsftmybzcwlmntuijlfcyracciqqrmuoeairzkqbgcouvi' 'cfthhszhvbplshxmwcnetnokovmdpimrnfzuxzcsaseszcfvetaxgoivjuzzclqprqkopn' 'hqgmjgoocsicqpqylatkzvvqlmhwbwjjmpvvwkkyctatirstsldsgzismqonmxxzntvkdf' 'ifzizharbsdfkjetcrjqfwocvcvqmywuevvevyjgiozkfialfpnarjqdinymibqlem' 'qakzgtofeuoeftutulclpkynxgoostaitkizewfirunxnhqhttsiervbxkpqdqyxbhxfdc' 'nvwbskiirbckkgbfizqypuoorpvovzqiunjnxswpuyaefbkobbmrvbgmrbbmbsvwffjcxf' 'ssesxjtiyvjkmemsrdusqvklspqbsohkhlcevwmtrveeaqwrurjknuwfkngcbnnjzpnvma' 'odvsiwjfnewxpjslocyveajsjjhxeuxsxtlgqvldzhbortagvybazlsjuagyueqsycyoxj' 'swrtljnlpikrjjccswczuxelpdnorlyjhpdszqdozngjxilqfoqalumaxapplnzscclctp' 'rtdxdagorlchmocypayepjrpcusowldnfgkihrxzcagoojndjmizwzoyugmqsqeyxpgege' 'ejelytulxeyfdufsszzfqrvupskwxrbbafnyzhkwlicpchivhhaxywsopdlnumpusctrje' 'ovmqlpytlfamdziwnxzyltlaodciummihzzsoxmadmmgluczscxdwiekmgsgsfpaeostme' 'tprfwcazbtbzwyibiuhbbahqalftfryyhpeeseduxftudcvmwdoxdvodgtxllvktkoxdta' 'xrgqmjtiwqlknpfctmwqyhliawxyzrywienvogdkwovkeanxmjnkrztsvrqviprquflimp' 'tjeouiphfcrtnisgaoxrjfgbwahijuxddbsxkhfqjwjwfcdgrbxagdcdekmoekshmkfwsl' 'mbivynyctpdrqkutnzdaohkgpwqvsihfkpajczlwoonfziynibnwjxczttumcbrnrswtri' 'qgxelwmjjvlwruuutnoozqpregjbaajrhhvsdicndnkvhepbseprvfjzmsamtkearzsuiu' 'ilhsgpwwqoafgvkpuwhujbenbwnuqvoygwrnlnjccjhiesyyogtyhymiuzclvrkbobpapy' 'crhjalcykreepmdbvyaxkvpuxdwmdllfcspdesbpjguysyaowbmhwbcufyhiksonleqpws' 'ffyzerxefufrcctexydegnxvajzrywjebiegzckfxqxzsqdpohuvusrvbrmanwepeivelg' 'jiwhhoxlemszimraisrvterytwhpasvkarrhgptlclklyblhuccnhumbqtqrllcldutkkn' 'vmyfyxhkecmhqubcvsvmkgxmsbgllqyhdxmbuulzwygmtipoakakqywjadvltusxrfymzk' 'mwjsjcayqbirlzpiipmebfyucqabcampwvigxieoknfwnvfvlranxyiaoibringfjolgxq' 'uhdaeqwjmhamvxldxzlzqunxawmmdjcyrgzvxvfjcfwydhbbhmbxhovhlhtoqwnicmeahj' 'jkpgitojuvwvtekomvwfkncxvfkzfrjpcyvlskvmfrizwsoiokoyxqwsvhsazbpbalmsvh' 'fbznavgoeuystwjpoexhfwjvxkgkdcridrwdncsrxrkqntgbydjdzszwcfgghyolqlodnh' 'ukyfblyhnwkwajpzgsfnynlnybynfmuzxseyddfrapnaycafugsstdfsfefkqaknsplwsq' 'ntgbufdukybcrugxnmbsxrsielxqiqhwjnxdtbydzzgqunnhzoawgsflecbmtjjcxggqhe' 'tgeaxynkgmzgjgzordrtqkdznaftqnyktdkrcxcikbouiniarathkxgyxmsnzrytuikwfm' 'eqotkxgtxxtrfeomclyvzymxrggcdmpicebmbifyfzpldexgqvbptnnlutnxfdfihhuipa' 'hvaxgdbdkszliszvetpsrvvxddeymuytpyrvctzmlyytrxovreojzjhcnlazgzsvykrbdq' 'nopmhgjwcbaqlaasdneemkdfgcpxdtoqhddoknlmomdzmdrprvtegxkmzajctytacxpmka' 'zzncyzgqpxmjcsgmfgmojfndgpawckwbjjeijlzzjmilfpxkwkzfqmjxbjteuqfeaknjvm' 'iezrqegnodynjpasmbbffwvlavwnfraowzfmdmaspygyograisgopcaqxwednerkexwijw' 'azvhyjnpkwiqkxsloqhsuvwlfbjbjtykturrefhpcpfnnyybpftjaqvfsfhbygmraejekq' 'umfzztyxuocoydftixzqzxwlpxpyczowmuwlnuiiilxgocaxaaozxklnialkaagmucyixh' 'qgsnnhqnfqntpleaymbkxckdpfgnnduejnrwuikayytokyoilqtisdmhisvwwpafcscxan' 'xylrnvpcebsxjlbvtkoogkegqhzsfzgdyrulnknslgqusrqmbebhpfofnnysnewlvqxjal' 'cmrshkjxwkcxsrdhwquujhzftvwqexbgjtyqdioatqxliatfrnabvzhoueeybgflzecdmq' 'dghbsqclvuyvvtudiohm' ); Suggested fixes (probably all of these are appropriate): 1. gistSplit should have check_stack_depth() 2. gistSplit should probably refuse to split anything if called with only one item (which is the value being inserted). 3. somewhere before reaching gistSplit it might make sense to check explicitly (e.g. in gistFormTuple) whether the tuple will actually fit on a page. 4. pg_trgm probably should do something more sensible with large leaf items, but this is a peripheral issue since ultimately the gist core must enforce these limits rather than rely on the opclass. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Robert Haas robertmh...@gmail.com writes: make check-world dies ingloriously for me, like this: FWIW, it works fine for me on my Mac laptop, using the Perl 5.16.2 that comes standard with OSX 10.9.5. I did have to install IPC::Run from CPAN though. # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. This output seems to be pretty clear proof that you're not using Apple's Perl. What is it exactly (where did you get it from)? Also, noticing that what you're using is evidently Perl 5.12, I'm wondering whether our TAP test scripts require a fairly new Perl version. I recall some of my Salesforce colleagues griping that the TAP scripts didn't work with older Perls. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements
On Thu, Oct 2, 2014 at 9:38 PM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Aug 26, 2014 at 4:20 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04/14/2014 10:31 PM, Fabrízio de Royes Mello wrote: The attached patch contains CINE for sequences. I just strip this code from the patch rejected before. Committed with minor changes Hmm, the CommitFest app lists Marko Tiikkaja as the reviewer, but I can't find his review anywhere... Maybe he have no time to review it. The documentation claims: CREATE [ IF NOT EXISTS ] SEQUENCE name But grammar implements it the other way around: CREATE SEQUENCE IF NOT EXISTS name; You are correct. Fix attached. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml index 7292c3f..9e364ff 100644 --- a/doc/src/sgml/ref/create_sequence.sgml +++ b/doc/src/sgml/ref/create_sequence.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -CREATE [ TEMPORARY | TEMP ] [ IF NOT EXISTS ] SEQUENCE replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ] +CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ] [ MINVALUE replaceable class=parameterminvalue/replaceable | NO MINVALUE ] [ MAXVALUE replaceable class=parametermaxvalue/replaceable | NO MAXVALUE ] [ START [ WITH ] replaceable class=parameterstart/replaceable ] [ CACHE replaceable class=parametercache/replaceable ] [ [ NO ] CYCLE ] [ OWNED BY { replaceable class=parametertable_name/replaceable.replaceable class=parametercolumn_name/replaceable | NONE } ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 10/2/14, 2:43 PM, Josh Berkus wrote: Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your production database is a feature we should provide to our growing user base? Yes. +1 2. What do you think such a feature should look like? As with others, I think EXPLAIN is a good way to do this without adding a keyword. So you'd do: EXPLAIN ALTER TABLE I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN in front of all your DDL. As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a development environment, which is where you would simply deploy the real DDL using verbose mode and see what the underlying commands actually do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers