Re: [HACKERS] pg_upgrade log files
Alvaro Herrera writes: > I propose this patch which echoes the commands to the respective log > files. I would backpatch this to 9.2. OK, but the fflush just before fclose seems a bit pointless; fclose will do that anyway no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade log files
Right now, the pg_upgrade output files contain the output of the commands it runs. However, it does not contain the actual commands being run ... so if you want to figure out what is happening, you have to trace the pg_upgrade source while reading the log file. This is, to say the least, suboptimal. I propose this patch which echoes the commands to the respective log files. I would backpatch this to 9.2. As a note, I find the exec_prog() function rather bizarre. It would be better if the caller did not have to manually specify a redirection, for instance. -- Álvaro Herrera 0001-Make-the-pg_upgrade-log-files-contain-actual-command.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Josh Berkus writes: > Well, I think it's "plausible but wrong under at least some common > circumstances". In addition to seeking, it ignores FS cache effects > (not that I have any idea how to account for these mathematically). It > also makes the assumption that 3 autovacuum workers running at 1/3 speed > each is better than having one worker running at full speed, which is > debatable. Well, no, not really, because the original implementation with only one worker was pretty untenable. But maybe we need some concept like only one worker working on *big* tables? Or at least, less than max_workers of them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed
Hi Ants, > -Original Message- > From: Ants Aasma [mailto:a...@cybertec.at] > Sent: Wednesday, June 27, 2012 9:23 PM > To: Robert Haas > Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois > Deliege > Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed > > Sorry about the delay in answering. I have been swamped with non-PG > related things lately. > > On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas wrote: > > On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas wrote: > >> On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita > >> wrote: > I'm confused by this remark, because surely the query planner does it this > way only if there's no LIMIT. When there is a LIMIT, we choose based on > the startup cost plus the estimated fraction of the total cost we expect > to pay based on dividing the LIMIT by the overall row count estimate. Or > is this not what you're talking about? > > My reasoning was that query_planner returns the cheapest-total path > and cheapest fractional presorted (by the aggregation pathkeys). When > evaluating hash-aggregates with this patch these two are indeed > compared considering the esimated fraction of the total cost, but this > might miss cheapest fractional unordered path for lazy hashaggregates. > > Reviewing the code now I discovered this path could be picked out from > the pathlist, just like it is done by > get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This > would need to be returned in addition to the other two paths. To > minimize overhead, this should only be done when we possibly want to > consider lazy hash-aggregation (there is a group clause with no > aggregates and grouping is hashable) But this is starting to get > pretty crufty considering that there doesn't seem to be any really > compelling usecases for this. > > > Ants, do you intend to update this patch for this CommitFest? Or at > > all? It seems nobody's too excited about this, so I'm not sure > > whether it makes sense for you to put more work on it. But please > > advise as to your plans. > > If anyone thinks that this patch might be worth considering, then I'm > prepared to do minor cleanup this CF (I saw some possibly unnecessary > cruft in agg_fill_hash_and_retrieve). On the other hand, if you think > the use case is too marginal to consider for inclusion then I won't > shed a tear if this gets rejected. For me this was mostly a learning > experience for poking around in the planner. Honestly, I'm not sure that it's worth including this, considering the use case... Thanks, Best regards, Etsuro Fujita > Ants Aasma > -- > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26 > A-2700 Wiener Neustadt > Web: http://www.postgresql-support.de > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
* Josh Berkus (j...@agliodbs.com) wrote: > I don't find Stephen's proposal of goal-based solutions to be practical. > A goal-based approach makes the assumption that database activity is > predictable, and IME most databases are anything but. We're talking about over the entire transaction space, and we can be pretty liberal, in my view, with our estimates. If we get it right, we might risk doing more autovac's for wraparound than strictly necessary, but they should happen over a sufficient time that it doesn't cause performance issues. One definite problem with this, of course, is that the wraparound autovac can't be stopped and restarted, and anything that increases the amount of wall-clock time required to complete the autovac will necessairly increase the risk that we'll lose a bunch of work due to a database restart. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
> I'm not especially sold on your theory that there's some behavior that > forces such convergence, but it's certainly plausible that there was, > say, a schema alteration applied to all of those partitions at about the > same time. In any case, as Robert has been saying, it seems like it > would be smart to try to get autovacuum to spread out the > anti-wraparound work a bit better when it's faced with a lot of tables > with similar relfrozenxid values. Well, I think we can go even further than that. I think one of the fundamental problems is that our "opportunistic" vacuum XID approach is essentially broken for any table which doesn't receive continuous update/deletes (I think Chris Browne makes largely the same point). They way opportunism currently works is via vacuum_freeze_table_age, which says "if you were going to vacuum this table *anyway*, and it's relfrozenxid is # old, then full-scan it". That works fine for tables getting constant UPDATEs to avoid hitting the wraparound deadline, but tables which have stopped getting activity, or are insert-only, never get it. What we should have instead is some opportunism in autovacuum which says: "If I have otherwise idle workers, and the system isn't too busy*, find the table with the oldest relfrozenxid which is over autovacuum_max_freeze_age/2 and vacuum-full-scan it." The key difficulty there is "if the system isn't too busy". That's a hard thing to determine, and subject to frequent change. An opportunistic solution would still be useful without that requirement, but not as helpful. I don't find Stephen's proposal of goal-based solutions to be practical. A goal-based approach makes the assumption that database activity is predictable, and IME most databases are anything but. A second obstacle to "opportunistic wraparound vacuum" is that wraparound vacuum is not interruptable. If you have to kill it off and do something else for a couple hours, it can't pick up where it left off; it needs to scan the whole table from the beginning again. > I continue to maintain that this problem is unrelated to wraparound as > such, and that thinking it is is a great way to design a bad solution. > There are any number of reasons why autovacuum might need to run > max_workers at once. What we need to look at is making sure that they > don't run the system into the ground when that happens. 100% agree. > Since your users weren't complaining about performance with one or two > autovac workers running (were they?), No, it's when we hit 3 that it fell over. Thresholds vary with memory and table size, of course. BTW, the primary reason I think (based on a glance at system stats) this drove the system to its knees was that the simultaneous wraparound vacuum of 3 old-cold tables evicted all of the "current" data out of the FS cache, forcing user queries which would normally hit the FS cache onto disk. I/O throughput was NOT at 100% capacity. During busy periods, a single wraparound vacuum wasn't enough to clear the FS cache because it's competing on equal terms with user access to data. But three avworkers "ganged up" on the user queries and kicked the tar out of them. Unfortunately, for the 5-worker system, I didn't find out about the issue until after it was over, and I know it was related to wraparound only because we were logging autovacuum. So I don't know if it had the same case. There are also problems with our defaults and measurements for the various vacuum_freeze settings, but changing those won't really fix the underlying problem, so it's not worth fiddling with them. The other solution, as mentioned last year, is to come up with a way in which old-cold data doesn't need to be vacuumed *at all*.This would be the ideal solution, but it's not clear how to implement it, since any wraparound-counting solution would bloat the CLOG intolerably. > we can assume that the cost-delay > settings were such as to not create a problem in that scenario. So it > seems to me that it's down to autovac_balance_cost(). Either there's > a plain-vanilla bug in there, or seek costs are breaking the assumption > that it's okay to give N workers each 1/Nth of the single-worker I/O > capacity. Yeah, I think our I/O balancing approach was too simplistic to deal with situations like this one. Factors I think break it are: * modifying cost-limit/cost-delay doesn't translate exactly into 1:1 modifying I/O (in fact, it seems higly unlikely that it does) * seek costs, as you mention * FS cache issues and competition with user queries (per above) > As far as bugs are concerned, I wonder if the premise of the calculation > > * The idea here is that we ration out I/O equally. The amount of I/O > * that a worker can consume is determined by cost_limit/cost_delay, so we > * try to equalize those ratios rather than the raw limit settings. > > might be wrong in itself? The ratio idea seems plausible but ... Well, I think it's "plausible but wron
Re: [HACKERS] Notify system doesn't recover from "No space" error
Christoph Berg writes: > Warming up an old thread here - we ran into the same problem. Thanks for the example proving it is repeatable. I dug into this a bit and found the problem. When a page of pg_notify is filled, asyncQueueAddEntries() advances the global QUEUE_HEAD to point to the next page before it calls SimpleLruZeroPage() to make that page exist in slru.c. But SimpleLruZeroPage could fail --- even though it doesn't attempt to write out the new page, it might need to dump some existing dirty page from shared memory before it can allocate a slot to the new page. If it does fail, then QUEUE_HEAD is pointing at a page number that doesn't exist either on disk or in slru.c buffers. Subsequent attempts to execute asyncQueueAddEntries() will fail because SimpleLruReadPage() says "page? what page?". And there is noplace else in async.c that would try to create the missing page. This error will persist until the server is restarted (thus resetting the state of pg_notify), even if the underlying disk-full condition is cleared. The solution is to make sure we don't advance QUEUE_HEAD into the new page until SimpleLruZeroPage has created it. There are a couple of ways the code could be fixed, but what I chose to do in the attached proposed patch is to make asyncQueueAddEntries work with a local queue_head pointer, which is only copied back to shared memory on successful exit. Comments, objections? regards, tom lane diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index fcb087ed15dfbb4d567ee0c742e1db8ec1353a2d..6fcd9093268e7bc5aaf6102fa9d6adcdcced1d6d 100644 *** a/src/backend/commands/async.c --- b/src/backend/commands/async.c *** static ListCell * *** 1285,1290 --- 1285,1291 asyncQueueAddEntries(ListCell *nextNotify) { AsyncQueueEntry qe; + QueuePosition queue_head; int pageno; int offset; int slotno; *** asyncQueueAddEntries(ListCell *nextNotif *** 1292,1299 /* We hold both AsyncQueueLock and AsyncCtlLock during this operation */ LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE); /* Fetch the current page */ ! pageno = QUEUE_POS_PAGE(QUEUE_HEAD); slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId); /* Note we mark the page dirty before writing in it */ AsyncCtl->shared->page_dirty[slotno] = true; --- 1293,1313 /* We hold both AsyncQueueLock and AsyncCtlLock during this operation */ LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE); + /* + * We work with a local copy of QUEUE_HEAD, which we write back to shared + * memory upon exiting. The reason for this is that if we have to advance + * to a new page, SimpleLruZeroPage might fail (out of disk space, for + * instance), and we must not advance QUEUE_HEAD if it does. (Otherwise, + * subsequent insertions would try to put entries into a page that slru.c + * thinks doesn't exist yet.) So, use a local position variable. Note + * that if we do fail, any already-inserted queue entries are forgotten; + * this is okay, since they'd be useless anyway after our transaction + * rolls back. + */ + queue_head = QUEUE_HEAD; + /* Fetch the current page */ ! pageno = QUEUE_POS_PAGE(queue_head); slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId); /* Note we mark the page dirty before writing in it */ AsyncCtl->shared->page_dirty[slotno] = true; *** asyncQueueAddEntries(ListCell *nextNotif *** 1305,1311 /* Construct a valid queue entry in local variable qe */ asyncQueueNotificationToEntry(n, &qe); ! offset = QUEUE_POS_OFFSET(QUEUE_HEAD); /* Check whether the entry really fits on the current page */ if (offset + qe.length <= QUEUE_PAGESIZE) --- 1319,1325 /* Construct a valid queue entry in local variable qe */ asyncQueueNotificationToEntry(n, &qe); ! offset = QUEUE_POS_OFFSET(queue_head); /* Check whether the entry really fits on the current page */ if (offset + qe.length <= QUEUE_PAGESIZE) *** asyncQueueAddEntries(ListCell *nextNotif *** 1331,1338 &qe, qe.length); ! /* Advance QUEUE_HEAD appropriately, and note if page is full */ ! if (asyncQueueAdvance(&(QUEUE_HEAD), qe.length)) { /* * Page is full, so we're done here, but first fill the next page --- 1345,1352 &qe, qe.length); ! /* Advance queue_head appropriately, and detect if page is full */ ! if (asyncQueueAdvance(&(queue_head), qe.length)) { /* * Page is full, so we're done here, but first fill the next page *** asyncQueueAddEntries(ListCell *nextNotif *** 1342,1353 * asyncQueueIsFull() ensured that there is room to create this * page without overrunning the queue. */ ! slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD)); /* And exit the loop */ break; } } LWLoc
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Josh Berkus writes: > So there are two parts to this problem, each of which needs a different > solution: > 1. Databases can inadvertently get to the state where many tables need > wraparound vacuuming at exactly the same time, especially if they have > many "cold" data partition tables. I'm not especially sold on your theory that there's some behavior that forces such convergence, but it's certainly plausible that there was, say, a schema alteration applied to all of those partitions at about the same time. In any case, as Robert has been saying, it seems like it would be smart to try to get autovacuum to spread out the anti-wraparound work a bit better when it's faced with a lot of tables with similar relfrozenxid values. > 2. When we do hit wraparound thresholds for multiple tables, autovacuum > has no hesitation about doing autovacuum_max_workers worth of wraparound > vacuum simultaneously, even when that exceeds the I/O capactity of the > system. I continue to maintain that this problem is unrelated to wraparound as such, and that thinking it is is a great way to design a bad solution. There are any number of reasons why autovacuum might need to run max_workers at once. What we need to look at is making sure that they don't run the system into the ground when that happens. Since your users weren't complaining about performance with one or two autovac workers running (were they?), we can assume that the cost-delay settings were such as to not create a problem in that scenario. So it seems to me that it's down to autovac_balance_cost(). Either there's a plain-vanilla bug in there, or seek costs are breaking the assumption that it's okay to give N workers each 1/Nth of the single-worker I/O capacity. As far as bugs are concerned, I wonder if the premise of the calculation * The idea here is that we ration out I/O equally. The amount of I/O * that a worker can consume is determined by cost_limit/cost_delay, so we * try to equalize those ratios rather than the raw limit settings. might be wrong in itself? The ratio idea seems plausible but ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb check_need_password fix
On Thu, Jun 28, 2012 at 3:48 PM, Peter Eisentraut wrote: > If you run initdb -A md5, you get an error > > initdb: must specify a password for the superuser to enable md5 authentication > > In 9.2, when you run something like initdb --auth-local=peer > --auth-host=md5, you still get that error, which I think is a mistake. > The error should only be shown if both authentication choices are > password-like. Hence I propose the attached patch. Looks right to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython issue with Win64 (PG 9.2)
On 27/06/12 13:57, Jan Urbański wrote: On 27/06/12 11:51, Asif Naeem wrote: Hi, On Windows 7 64bit, plpython is causing server crash with the following test case i.e. CREATE PROCEDURAL LANGUAGE 'plpython3u'; CREATE OR REPLACE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a> b: return a return b $$ LANGUAGE plpython3u; SELECT pymax(1, 2); I think primary reason that trigger this issue is when Function PLyUnicode_Bytes() calls "PyUnicode_AsEncodedString( ,WIN1252 /*Server encoding*/, ) " it fails with null. I built latest pg 9.2 source code with python 3.2.2.3 by using Visual Studio 2010. Thanks. I'll try to reproduce this on Linux, which should be possible given the results of your investigation. Your analysis is correct, I managed to reproduce this by injecting serverenc = "win1252"; into PLyUnicode_Bytes. The comment in that function says that Python understands all PostgreSQL encoding names except for SQL_ASCII, but that's not really true. In your case GetDatabaseEncodingName() returns "WIN1252" and Python accepts "CP125". I'm wondering how this should be fixed. Just by adding more special cases in PLyUnicode_Bytes? Even if we add a switch statement that would convert PG_WIN1250 into "CP1250", Python can still raise an exception when encoding (for various reasons). How about replacing the PLy_elog there with just an elog? This loses traceback support and the Python exception message, which could be helpful for debugging (something like "invalid character for encoding cp1250"). OTOH, I'm uneasy about invoking the entire PLy_elog machinery from a function that's as low-level as PLyUnicode_Bytes. Lastly, we map SQL_ASCII to "ascii" which is arguably wrong. The function is supposed to return bytes in the server encoding, and under SQL_ASCII that probably means we can return anything (ie. use any encoding we deem useful). Using "ascii" as the Python codec name will raise an error on anything that has the high bit set. So: I'd add code to translate WINxxx into CPxxx when choosing the Python to use, change PLy_elog to elog in PLyUnicode_Bytes and leave the SQL_ASCII case alone, as there were no complaints and people using SQL_ASCII are asking for it anyway. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, Jun 28, 2012 at 2:32 PM, Peter Geoghegan wrote: > On 28 June 2012 22:22, Daniel Farina wrote: >> All in all, I don't think this can be a very productive discussion >> unless someone just pitches a equal or better name overall in terms of >> conciseness and descriptiveness. I'd rather optimize for those >> attributes. Old advice is old; that's the nature of the beast. > > Robert suggested wal_flush_delay, which does more accurately describe > what happens now. Well, I learned something from reading this name, having not followed the mechanism too closely. I like it. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
On Thursday, June 28, 2012 11:45:08 PM Alvaro Herrera wrote: > Excerpts from Andres Freund's message of jue jun 28 17:06:49 -0400 2012: > > On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote: > > > What I wonder is how hard it would be to remove catcache.h's structs > > > into the implementation. Thats the reason why the old and new list > > > implementation currently is included all over the backend... > > > > Moving them into the implementation isn't possible, but catcache.h being > > included just about everywhere simply isn't needed. > > > > It being included everywhere was introduced by a series of commits from > > Bruce: b85a965f5fc7243d0386085e12f7a6c836503b42 > > b43ebe5f83b28e06a3fd933b989aeccf0df7844a > > e0522505bd13bc5aae993fc50b8f420665d78e96 > > and others > > > > That looks broken. An implementation file not including its own header... > > A minimal patch to fix this particular problem is attached (looks like > > there are others in the series). > > Hmm, I think this is against project policy -- we do want each header to > be compilable separately. I would go instead the way of splitting > resowner.h in two or more pieces. It was done nearly the same way in catcache.h before Bruce changed things. You can see still the rememnants of that in syscache.h: /* list-search interface. Users of this must import catcache.h too */ extern struct catclist *SearchSysCacheList(int cacheId, int nkeys, Datum key1, Datum key2, Datum key3, Datum key4); The only difference is that gcc warns if you declare a struct in a parameter - thats why I forward declared it explicitly... resowner.h still compiles standalone and is still usable. You can even call ResourceOwnerRememberCatCacheListRef if you get the list parameter from somewhere else. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
Excerpts from Andres Freund's message of jue jun 28 17:06:49 -0400 2012: > On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote: > > What I wonder is how hard it would be to remove catcache.h's structs into > > the implementation. Thats the reason why the old and new list > > implementation currently is included all over the backend... > Moving them into the implementation isn't possible, but catcache.h being > included just about everywhere simply isn't needed. > > It being included everywhere was introduced by a series of commits from Bruce: > b85a965f5fc7243d0386085e12f7a6c836503b42 > b43ebe5f83b28e06a3fd933b989aeccf0df7844a > e0522505bd13bc5aae993fc50b8f420665d78e96 > and others > > That looks broken. An implementation file not including its own header... A > minimal patch to fix this particular problem is attached (looks like there > are > others in the series). Hmm, I think this is against project policy -- we do want each header to be compilable separately. I would go instead the way of splitting resowner.h in two or more pieces. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 22:22, Daniel Farina wrote: > All in all, I don't think this can be a very productive discussion > unless someone just pitches a equal or better name overall in terms of > conciseness and descriptiveness. I'd rather optimize for those > attributes. Old advice is old; that's the nature of the beast. Robert suggested wal_flush_delay, which does more accurately describe what happens now. Old advice is old, but we frequently go to reasonable lengths to protect users from making predictable mistakes. The position that we shouldn't change the name might make sense if there was any downside to doing so, but there isn't. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, Jun 28, 2012 at 1:32 PM, Peter Geoghegan wrote: >> Anyway, it seems that no one other than you and I is very excited >> about renaming this for whatever reason, so maybe we should leave it >> at that. > > I think not changing the name is a really bad decision, and I am > personally unhappy about it. I immediately agreed to your proposed > adjustment to the patch without really thinking that it mattered, > because it had no plausible downside, so why not? > > That's all I have to say about the matter. If it isn't obvious that > the name should be changed, based on what I've already said, it never > will be. All in all, I don't think this can be a very productive discussion unless someone just pitches a equal or better name overall in terms of conciseness and descriptiveness. I'd rather optimize for those attributes. Old advice is old; that's the nature of the beast. The one benefit I can think of for name shuffling is avoiding accidental negation of the optimization when porting Postgres configuration from older clusters, and even then I don't think the rename-on-change strategy is a scalable one; a tuning-linting tool is probably the only scalable option if one is concerned about making sure that those forward-porting their configurations or managing multiple postgres versions don't shoot themselves in the foot. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
Excerpts from Andres Freund's message of jue jun 28 16:03:26 -0400 2012: > > On Thursday, June 28, 2012 09:47:05 PM Alvaro Herrera wrote: > > Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012: > > > Looks good now? > > > > The one thing I dislike about this code is the names you've chosen. I > > mean, ilist_s_stuff and ilist_d_stuff. I mean, why not just Slist_foo > > and Dlist_bar, say? As far as I can tell, you've chosen the "i" prefix > > because it's "integrated" or "inline", but this seems to me a rather > > irrelevant implementation detail that's of little use to the callers. > Well, its not irrelevant because you actually need to change the contained > structs to use it. I find that a pretty relevant distinction. Sure, at that point it is relevant. Once you're past that, there's no point. I mean, you would look up how it's used in the header comment of the implementation, and then just refer to the API. > > Also, I don't find so great an idea to have everything in a single file. > > Is there anything wrong with separating singly and doubly linked lists > > each to its own file? Other than you not liking it, I mean. As a > > person who spends some time trying to untangle header dependencies, I > > would appreciate keeping stuff as lean as possible. However, since > > nobody else seems to have commented on this, maybe it's just me. > Robert had the same comment, its not just you... > > It would mean duplicating the ugliness around the conditional inlining, the > comment explaining how to use the stuff (because its basically used the same > way for single and double linked lists), you would need to #define > ilist_container twice or have a third file > Just seems to much overhead for ~100 lines (the single linked list > implementation). Well, then don't duplicate a comment -- create a README.lists and refer to it in the comments. Not sure about the ilist_container stuff, but in principle I don't have a problem with having a file with a single #define that's included by two other headers. > What I wonder is how hard it would be to remove catcache.h's structs into the > implementation. Thats the reason why the old and new list implementation > currently is included all over the backend... Yeah, catcache.h is a pretty ugly beast. I'm sure there are ways to behead it. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
On Thursday, June 28, 2012 10:03:26 PM Andres Freund wrote: > What I wonder is how hard it would be to remove catcache.h's structs into > the implementation. Thats the reason why the old and new list > implementation currently is included all over the backend... Moving them into the implementation isn't possible, but catcache.h being included just about everywhere simply isn't needed. It being included everywhere was introduced by a series of commits from Bruce: b85a965f5fc7243d0386085e12f7a6c836503b42 b43ebe5f83b28e06a3fd933b989aeccf0df7844a e0522505bd13bc5aae993fc50b8f420665d78e96 and others That looks broken. An implementation file not including its own header... A minimal patch to fix this particular problem is attached (looks like there are others in the series). Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 45e2c358e6a21e837f13731981da2644bcb57a88 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 28 Jun 2012 23:03:44 +0200 Subject: [PATCH] Stop including catcache.h from syscache.h syscache.h used to not rely on catcache.h and even today ships with the comment "Users of this must import catcache.h too" for the one function requiring catcache.h knowledge. This was changed in a series of commits including: b85a965f5fc7243d0386085e12f7a6c836503b42 b43ebe5f83b28e06a3fd933b989aeccf0df7844a e0522505bd13bc5aae993fc50b8f420665d78e96 Change it back. --- src/backend/access/transam/xact.c |1 + src/backend/catalog/namespace.c |1 + src/backend/catalog/pg_conversion.c |1 + src/backend/catalog/pg_enum.c |1 + src/backend/utils/adt/acl.c |1 + src/backend/utils/cache/attoptcache.c |1 + src/backend/utils/cache/catcache.c|1 + src/backend/utils/cache/inval.c |1 + src/backend/utils/cache/lsyscache.c |1 + src/backend/utils/cache/relcache.c|1 + src/backend/utils/cache/spccache.c|1 + src/backend/utils/cache/syscache.c|1 + src/backend/utils/cache/ts_cache.c|1 + src/backend/utils/cache/typcache.c|1 + src/backend/utils/resowner/resowner.c |5 +++-- src/include/utils/resowner.h | 10 ++ src/include/utils/snapmgr.h |1 + src/include/utils/syscache.h |2 +- 18 files changed, 25 insertions(+), 7 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 4755ee6..1f743f7 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -44,6 +44,7 @@ #include "storage/procarray.h" #include "storage/sinvaladt.h" #include "storage/smgr.h" +#include "utils/catcache.h" #include "utils/combocid.h" #include "utils/guc.h" #include "utils/inval.h" diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 20850ab..10ad82b 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -46,6 +46,7 @@ #include "storage/sinval.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/catcache.h" #include "utils/guc.h" #include "utils/inval.h" #include "utils/lsyscache.h" diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c index f86c84f..358bd39 100644 --- a/src/backend/catalog/pg_conversion.c +++ b/src/backend/catalog/pg_conversion.c @@ -25,6 +25,7 @@ #include "catalog/pg_proc.h" #include "mb/pg_wchar.h" #include "utils/builtins.h" +#include "utils/catcache.h" #include "utils/fmgroids.h" #include "utils/rel.h" #include "utils/syscache.h" diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index 41665c1..20e26c4 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -23,6 +23,7 @@ #include "storage/lmgr.h" #include "miscadmin.h" #include "utils/builtins.h" +#include "utils/catcache.h" #include "utils/fmgroids.h" #include "utils/syscache.h" #include "utils/tqual.h" diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index 77322a1..2cc87d8 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -29,6 +29,7 @@ #include "miscadmin.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/catcache.h" #include "utils/inval.h" #include "utils/lsyscache.h" #include "utils/memutils.h" diff --git a/src/backend/utils/cache/attoptcache.c b/src/backend/utils/cache/attoptcache.c index e01ae21..5d872ba 100644 --- a/src/backend/utils/cache/attoptcache.c +++ b/src/backend/utils/cache/attoptcache.c @@ -18,6 +18,7 @@ #include "access/reloptions.h" #include "utils/attoptcache.h" +#include "utils/catcache.h" #include "utils/hsearch.h" #include "utils/inval.h" #include "utils/syscache.h" diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 0307b96..f27d90d 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 20:55, Robert Haas wrote: >>> I don't buy this line of reasoning at all. If we're going to rename >>> the GUC, it should be for accuracy, not PR value. If we start >>> renaming something every time we improve it, we're going to go nuts. >>> We improved lots of things in 9.2; they didn't all get renamed. >> >> That is a false equivalence, and you know it. > > A false equivalence between what and what? Changing the name in order to create some buzz about it, or to highlight an improvement, is quite different to changing the name because the practical advice surrounding the setting has suddenly altered, almost diametrically, and after a very long period. >> Who said anything about PR value? I'm concerned with not confusing users. > > My point here is that we don't normally rename things because they > work better than they used to. Whether that is called PR value or not > confusing users, we don't normally do it. That makes sense. I don't dispute that. We aren't talking about an incremental improvement here, are we? We're talking about night and day. That is probably an unprecedented state of affairs, so I don't see any reason to invoke precedent. The reason the VACUUM FULL situation isn't at all comparable here is because: For VACUUM FULL: User googles "VACUUM FULL", gets something from the 8.* docs, or something from that era. (This happens more often than not for any given Postgres term, as you rightly complained about at PgCon). They see something that says "you should probably use cluster instead". They do so, and are no worse off for it. For commit_delay: User googles commit_delay, and sees something from that same period that says "you don't want to use this". They naturally assume that they don't. They lose whatever benefit they might have had from the new implementation. > We sometimes rename things because they do something *different* than > what they used to do. But not just because they work better. It does do something different. That's why you proposed changing the name. > Anyway, it seems that no one other than you and I is very excited > about renaming this for whatever reason, so maybe we should leave it > at that. I think not changing the name is a really bad decision, and I am personally unhappy about it. I immediately agreed to your proposed adjustment to the patch without really thinking that it mattered, because it had no plausible downside, so why not? That's all I have to say about the matter. If it isn't obvious that the name should be changed, based on what I've already said, it never will be. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
On Thursday, June 28, 2012 09:47:05 PM Alvaro Herrera wrote: > Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012: > > Looks good now? > > The one thing I dislike about this code is the names you've chosen. I > mean, ilist_s_stuff and ilist_d_stuff. I mean, why not just Slist_foo > and Dlist_bar, say? As far as I can tell, you've chosen the "i" prefix > because it's "integrated" or "inline", but this seems to me a rather > irrelevant implementation detail that's of little use to the callers. Well, its not irrelevant because you actually need to change the contained structs to use it. I find that a pretty relevant distinction. > Also, I don't find so great an idea to have everything in a single file. > Is there anything wrong with separating singly and doubly linked lists > each to its own file? Other than you not liking it, I mean. As a > person who spends some time trying to untangle header dependencies, I > would appreciate keeping stuff as lean as possible. However, since > nobody else seems to have commented on this, maybe it's just me. Robert had the same comment, its not just you... It would mean duplicating the ugliness around the conditional inlining, the comment explaining how to use the stuff (because its basically used the same way for single and double linked lists), you would need to #define ilist_container twice or have a third file Just seems to much overhead for ~100 lines (the single linked list implementation). What I wonder is how hard it would be to remove catcache.h's structs into the implementation. Thats the reason why the old and new list implementation currently is included all over the backend... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Thu, Jun 28, 2012 at 3:03 PM, Josh Berkus wrote: > 1. Databases can inadvertently get to the state where many tables need > wraparound vacuuming at exactly the same time, especially if they have > many "cold" data partition tables. This suggests that this should be handled rather earlier, and with some attempt to not do them all simultaneously. In effect, if there are 25 tables that will need wraparound vacuums in the next million transactions, it is presumably beneficial to start hitting on them right away, ideally one at a time, so as to draw their future needs further apart. The loose thought is that any time autovac isn't very busy, it should consider (perhaps based on probability?) picking a table that is in a cluster of tables that currently have wraparound needs at about the same time, and, in effect, spread that cluster out. I suppose there are two considerations, that conflict somewhat: a) If there are tables that Absolutely Require wraparound vacuuming, Right Soon Now, there's nothing to help this. They MUST be vacuumed, otherwise the system will get very unhappy. b) It's undesirable to *worsen* things by 'clustering' future wraparound vacuums together, which gets induced any time autovac is continually vacuuming a series of tables. If 25 tables get vacuumed right around now, then that may cluster their next wraparound vacuum to 2^31 transactions from 'right around now.' But there's no helping a). I suppose this suggests having an autovac thread that is 'devoted' to spreading out future wraparound vacuums. - If a *lot* of tables were just vacuumed recently, then it shouldn't do anything, as Right Now is a cluster of 'badness.' - It should group tables by slicing their next wraparounds (grouping by rounding wraparound txid to the nearest, say, 10M or 20M), and consider vacuuming a table Right Now that would take that table out of the worst such "slice" Thus, supposing the grouping is like: | TxId - nearest 10 million | Tables Wrapping In Range | |---+--| | 0 | 250 | | 1 | 80 | | 2 | 72 | | 3 | 30 | | 4 | 21 | | 5 | 35 | | 6 |9 | | 7 | 15 | | 8 |8 | | 9 |7 | |10 | 22 | |11 | 35 | |12 | 14 | |13 | 135 | |14 | 120 | |15 | 89 | |16 | 35 | |17 | 45 | |18 | 60 | |19 | 25 | |20 | 15 | |21 | 150 | Suppose current txid is 750, and the reason for there to be 250 tables in the current range is that there are a bunch of tables that get *continually* vacuumed. No need to worry about that range, and I'll presume that these are all in the past. In this example, it's crucial to, pretty soon, vacuum the 150 tables in partition #21, as they're getting near wraparound. Nothing to be improved on there. Though it would be kind of nice to start on the 150 as early as possible, so that we *might* avoid having them dominate autovac, as in Josh Berkus' example. But once those are done, the next "crucial" set, in partition #20, are a much smaller set of tables. It would be nice, at that point, to add in a few tables from partitions #13 and #14, to smooth out the burden. The ideal "steady state" would look like the following: | TxId - nearest 10 million | Tables Wrapping In Range | |---+--| | 0 | 250 | | 1 | 51 | | 2 | 51 | | 3 | 51 | | 4 | 51 | | 5 | 51 | | 6 | 51 | | 7 | 51 | | 8 | 51 | | 9 | 51 | |10 | 51 | |11 | 51 | |12 |
Re: [HACKERS] embedded list v2
On Thu, Jun 28, 2012 at 3:47 PM, Alvaro Herrera wrote: > > Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012: > >> Looks good now? > > The one thing I dislike about this code is the names you've chosen. I > mean, ilist_s_stuff and ilist_d_stuff. I mean, why not just Slist_foo > and Dlist_bar, say? As far as I can tell, you've chosen the "i" prefix > because it's "integrated" or "inline", but this seems to me a rather > irrelevant implementation detail that's of little use to the callers. > > Also, I don't find so great an idea to have everything in a single file. > Is there anything wrong with separating singly and doubly linked lists > each to its own file? Other than you not liking it, I mean. As a > person who spends some time trying to untangle header dependencies, I > would appreciate keeping stuff as lean as possible. However, since > nobody else seems to have commented on this, maybe it's just me. Well, it's not JUST you. I agree entirely with all of your points. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, Jun 28, 2012 at 2:58 PM, Peter Geoghegan wrote: > On 28 June 2012 19:55, Robert Haas wrote: >> On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan >> wrote: >>> You think it will confuse users less if we start telling them to use >>> something that we have a very long history of telling them not to use? >> >> I don't buy this line of reasoning at all. If we're going to rename >> the GUC, it should be for accuracy, not PR value. If we start >> renaming something every time we improve it, we're going to go nuts. >> We improved lots of things in 9.2; they didn't all get renamed. > > That is a false equivalence, and you know it. A false equivalence between what and what? > Who said anything about PR value? I'm concerned with not confusing users. My point here is that we don't normally rename things because they work better than they used to. Whether that is called PR value or not confusing users, we don't normally do it. We sometimes rename things because they do something *different* than what they used to do. But not just because they work better. Anyway, it seems that no one other than you and I is very excited about renaming this for whatever reason, so maybe we should leave it at that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Excerpts from Josh Berkus's message of jue jun 28 15:03:15 -0400 2012: > 2) They have large partitioned tables, in which the partitions are > time-based and do not receive UPDATES after a certain date. Each > partition was larger than RAM. I think the solution to this problem has nothing to do with vacuum or autovacuum settings, and lots to do with cataloguing enough info about each of these tables to note that, past a certain point, they don't need any vacuuming at all. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] initdb check_need_password fix
If you run initdb -A md5, you get an error initdb: must specify a password for the superuser to enable md5 authentication In 9.2, when you run something like initdb --auth-local=peer --auth-host=md5, you still get that error, which I think is a mistake. The error should only be shown if both authentication choices are password-like. Hence I propose the attached patch. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index bac8385..edd5d71 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2574,13 +2574,19 @@ struct tsearch_config_match } static void -check_need_password(const char *authmethod) +check_need_password(const char *authmethodlocal, const char *authmethodhost) { - if ((strcmp(authmethod, "md5") == 0 || - strcmp(authmethod, "password") == 0) && + if ((strcmp(authmethodlocal, "md5") == 0 || + strcmp(authmethodlocal, "password") == 0) && + (strcmp(authmethodhost, "md5") == 0 || + strcmp(authmethodhost, "password") == 0) && !(pwprompt || pwfilename)) { - fprintf(stderr, _("%s: must specify a password for the superuser to enable %s authentication\n"), progname, authmethod); + fprintf(stderr, _("%s: must specify a password for the superuser to enable %s authentication\n"), progname, +(strcmp(authmethodlocal, "md5") == 0 || + strcmp(authmethodlocal, "password") == 0) +? authmethodlocal +: authmethodhost); exit(1); } } @@ -2792,8 +2798,7 @@ struct tsearch_config_match check_authmethod_valid(authmethodlocal, auth_methods_local, "local"); check_authmethod_valid(authmethodhost, auth_methods_host, "host"); - check_need_password(authmethodlocal); - check_need_password(authmethodhost); + check_need_password(authmethodlocal, authmethodhost); if (strlen(pg_data) == 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
Excerpts from Andres Freund's message of jue jun 28 14:20:59 -0400 2012: > Looks good now? The one thing I dislike about this code is the names you've chosen. I mean, ilist_s_stuff and ilist_d_stuff. I mean, why not just Slist_foo and Dlist_bar, say? As far as I can tell, you've chosen the "i" prefix because it's "integrated" or "inline", but this seems to me a rather irrelevant implementation detail that's of little use to the callers. Also, I don't find so great an idea to have everything in a single file. Is there anything wrong with separating singly and doubly linked lists each to its own file? Other than you not liking it, I mean. As a person who spends some time trying to untangle header dependencies, I would appreciate keeping stuff as lean as possible. However, since nobody else seems to have commented on this, maybe it's just me. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 2:51 PM, Tom Lane wrote: > Robert Haas writes: >> I tried this. At least on my fairly vanilla MacOS X desktop, an mlock >> for a larger amount of memory than was conveniently on hand (4GB, on a >> 4GB box) neither succeeded nor failed in a timely fashion but instead >> progressively hung the machine, apparently trying to progressively >> push every available page out to swap. After 5 minutes or so I could >> no longer move the mouse. After about 20 minutes I gave up and hit >> the reset button. So there's apparently no value to this as a >> diagnostic tool, at least on this platform. > > Fun. I wonder if other BSDen are as brain-dead as OSX on this point. > > It'd probably at least be worth filing a bug report with Apple about it. Just for fun, I tried writing a program that does power-of-two-sized malloc requests. The first one that failed - on my 4GB Mac, remember - was for 140737488355328 bytes. Yeah, that' s right: 128 TB. According to the Google, there is absolutely no way of gettIng MacOS X not to overcommit like crazy. You can read the amount of system memory by using sysctl() to fetch hw.memsize, but it's not really clear how much that helps. We could refuse to start up if the shared memory allocation is >= hw.memsize, but even an amount slightly less than that seems like enough to send the machine into a tailspin, so I'm not sure that really gets us anywhere. One idea I had was to LOG the size of the shared memory allocation just before allocating it. That way, if your system goes into the tank, there will at least be something in the log. But that would be useless chatter for most users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 20:00, Tom Lane wrote: > See VACUUM FULL for a recent counterexample --- we basically jacked it > up and drove a new implementation underneath, but we didn't change the > name, despite the fact that we were obsoleting a whole lot more folk > knowledge than exists around commit_delay. > > Of course, there were application-compatibility reasons not to rename > that command, which wouldn't apply so much to commit_delay. But still, > we have precedent for expecting that we can fix external documentation > rather than trying to code around whatever it says. I'm sure you're right to some degree. We can rely on some, maybe even most users to go and learn about these things, or hear about them somehow. But why should we do it that way, when what I've proposed is so much simpler, and has no plausible downside? http://archives.postgresql.org/pgsql-hackers/2005-06/msg01463.php Here, you say: "" The issue here is not "is group commit a good idea in the abstract?". It is "is the commit_delay implementation of the idea worth a dime?" ... and the evidence we have all points to the answer "NO". We should not let theoretical arguments blind us to this. "" What happens when someone reads this, or many other such statements were made before or since, when they are considering setting commit_delay now? The VACUUM FULL implementation is now very close to being nothing more than a synonym of CLUSTER. The only reason it happened that way was because it was easier than just deprecating VACUUM FULL. The old VACUUM FULL actually had something to recommend it. It seems unlikely that the same thing can be said for commit_delay. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Robert, Tom, Stephen, So, first, a description of the specific problem I've encountered at two sites. I'm working on another email suggesting workarounds and solutions, but that's going to take a bit longer. Observation --- This problem occured on two database systems which shared the following characteristics: 1) They were running with default autovacuum & vacuum settings, except that one database had 5 workers instead of 3. 2) They have large partitioned tables, in which the partitions are time-based and do not receive UPDATES after a certain date. Each partition was larger than RAM. 3) The databases are old enough, and busy enough, to have been through XID wraparound at least a couple of times. Users reported that the database system became unresponsive, which was surprising since both of these DBs had been placed on hardware which was engineered for at least 100% growth over the current database size. On investigation, we discovered the following things: a) Each database had autovacuum_max_workers (one DB 5, one DB 3) doing anti-wraparound vacuum on several partitions simultaneously. b) The I/O created by the anti-wraparound vacuum was tying up the system. c) terminating any individual autovacuum process didn't help, as it simply caused autovac to start on a different partition. So, first question was: why was autovacuum wanting to anti-wrapround vacuum dozens of tables at the same time? A quick check showed that all of these partitions had nearly identical XID ages (as in less than 100,000 transactions apart), which all had exceeded autovacuum_max_freeze_age. How did this happen? I'm still not sure. One thought is: this is an artifact of the *previous* wraparound vacuums on each database. On cold partitions with old dead rows which have been through wraparound vacuum several times, this tends to result in the cold partitions converging towards having the same relfrozenxid over time; I'm still working on the math to prove this. Alternately, it's possible that a schema change to the partitioned tables gave them all the same effective relfrozenxid at some point in the past; both databases are still in development. So there are two parts to this problem, each of which needs a different solution: 1. Databases can inadvertently get to the state where many tables need wraparound vacuuming at exactly the same time, especially if they have many "cold" data partition tables. 2. When we do hit wraparound thresholds for multiple tables, autovacuum has no hesitation about doing autovacuum_max_workers worth of wraparound vacuum simultaneously, even when that exceeds the I/O capactity of the system. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
Robert Haas writes: > On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan > wrote: >> You think it will confuse users less if we start telling them to use >> something that we have a very long history of telling them not to use? > I don't buy this line of reasoning at all. If we're going to rename > the GUC, it should be for accuracy, not PR value. If we start > renaming something every time we improve it, we're going to go nuts. > We improved lots of things in 9.2; they didn't all get renamed. See VACUUM FULL for a recent counterexample --- we basically jacked it up and drove a new implementation underneath, but we didn't change the name, despite the fact that we were obsoleting a whole lot more folk knowledge than exists around commit_delay. Of course, there were application-compatibility reasons not to rename that command, which wouldn't apply so much to commit_delay. But still, we have precedent for expecting that we can fix external documentation rather than trying to code around whatever it says. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 19:55, Robert Haas wrote: > On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan > wrote: >> You think it will confuse users less if we start telling them to use >> something that we have a very long history of telling them not to use? > > I don't buy this line of reasoning at all. If we're going to rename > the GUC, it should be for accuracy, not PR value. If we start > renaming something every time we improve it, we're going to go nuts. > We improved lots of things in 9.2; they didn't all get renamed. That is a false equivalence, and you know it. Who said anything about PR value? I'm concerned with not confusing users. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan wrote: > You think it will confuse users less if we start telling them to use > something that we have a very long history of telling them not to use? I don't buy this line of reasoning at all. If we're going to rename the GUC, it should be for accuracy, not PR value. If we start renaming something every time we improve it, we're going to go nuts. We improved lots of things in 9.2; they didn't all get renamed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Robert Haas writes: > I tried this. At least on my fairly vanilla MacOS X desktop, an mlock > for a larger amount of memory than was conveniently on hand (4GB, on a > 4GB box) neither succeeded nor failed in a timely fashion but instead > progressively hung the machine, apparently trying to progressively > push every available page out to swap. After 5 minutes or so I could > no longer move the mouse. After about 20 minutes I gave up and hit > the reset button. So there's apparently no value to this as a > diagnostic tool, at least on this platform. Fun. I wonder if other BSDen are as brain-dead as OSX on this point. It'd probably at least be worth filing a bug report with Apple about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 1:43 PM, Tom Lane wrote: > Magnus Hagander writes: >> On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund >> wrote: >>> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote: What happens if you mlock() it into memory - does that fail quickly? Is that not something we might want to do *anyway*? > >>> You normally can only mlock() mminor amounts of memory without changing >>> settings. Requiring to change that setting (aside that mlocking would be a >>> bad >>> idea imo) would run contrary to the point of the patch, wouldn't it? ;) > >> It would. I wasn't aware of that limitation :) > > The OSX man page says that mlock should give EAGAIN for a permissions > failure (ie, exceeding the rlimit) but > > [ENOMEM] Some portion of the indicated address range is not > allocated. There was an error faulting/mapping a > page. > > It might be helpful to try mlock (if available, which it isn't > everywhere) and complain about ENOMEM but not other errors. If course, > if the kernel checks rlimit first, we won't learn anything ... I tried this. At least on my fairly vanilla MacOS X desktop, an mlock for a larger amount of memory than was conveniently on hand (4GB, on a 4GB box) neither succeeded nor failed in a timely fashion but instead progressively hung the machine, apparently trying to progressively push every available page out to swap. After 5 minutes or so I could no longer move the mouse. After about 20 minutes I gave up and hit the reset button. So there's apparently no value to this as a diagnostic tool, at least on this platform. > I think it *would* be a good idea to mlock if we could. Setting shmem > large enough that it swaps has always been horrible for performance, > and in sysv-land there's no way to prevent that. But we can't error > out on permissions failure. I wouldn't mind having an option, but I think there'd have to be a way to turn it off for people trying to cram as many lightly-used VMs as possible onto a single server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 19:25, Kevin Grittner wrote: > Peter Geoghegan wrote: > >> Is anyone aware of a non-zero commit_delay in the wild today? I >> personally am not. > > http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php In that thread, Robert goes on to say to the OP that has set commit_delay: >On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire wrote: >> I don't think 1 second can be such a big difference for the bgwriter, >> but I might be wrong. > > Well, the default value is 200 ms. And I've never before heard of > anyone tuning it up, except maybe to save on power consumption on a > system with very low utilization. Nearly always you want to reduce > it. > >> The wal_writer makes me doubt, though. If logged activity was higher >> than 8MB/s, then that setting would block it all. >> I guess I really should lower it. > > Here again, you've set it to ten times the default value. That > doesn't seem like a good idea. I would start with the default and > tune down. So, let me rephrase my question: Is anyone aware of a non-zero commit_delay in the wild today with sensible reasoning behind it? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
Peter Geoghegan wrote: > Is anyone aware of a non-zero commit_delay in the wild today? I > personally am not. http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
On Thursday, June 28, 2012 06:23:05 PM Robert Haas wrote: > On Tue, Jun 26, 2012 at 7:26 PM, Andres Freund wrote: > > Attached are three patches: > > 1. embedded list implementation > > 2. make the list implementation usable without USE_INLINE > > 3. convert all callers to ilist.h away from dllist.h > > This code doesn't follow PostgreSQL coding style guidelines; in a > function definition, the name must start on its own line. Hrmpf. Yes. > Also, stuff like this is grossly unlike what's done elsewhere; use the same > formatting as e.g. foreach: > +#define ilist_d_reverse_foreach(name, ptr) for(name = > (ptr)->head.prev;\ > + name != &(ptr)->head;\ > + name = name->prev) Its not only unlike the rest its also ugly... Fixed. > And this is definitely NOT going to survive pgindent: > > +for(cur = head->head.prev; > +cur != &head->head; > +cur = cur->prev){ > + if(!(cur) || > + !(cur->next) || > + !(cur->prev) || > + !(cur->prev->next == cur) || > + !(cur->next->prev == cur)) > + { > + elog(ERROR, "double linked list is corrupted"); > + } > +} I changed the for() contents to one line. I don't think I can write anything that won't be changed by pgindent for the if()s contents. > And this is another meme we don't use elsewhere; add an explicit NULL test: > > + while ((cur = last->next)) Fixed. > And then there's stuff like this: > > + if(!cur){ > + elog(ERROR, "single linked list is corrupted"); > + } Fixed the places that I found. > Aside from the formatting issues, I think it would be sensible to > preserve the terminology of talking about the "head" and "tail" of a > list that we use elsewhere, instead of calling them the "front" and > "back" as you've done here. I would suggest that instead of add_after > and add_before you use the names insert_after and insert_before, which > I think is more clear; also instead of remove, I think you should say > delete, for consistency with pg_list.h. Heh. Ive been poisoned from using c++ too much (thats partially stl names). Changed. > A number of these inlined functions could be rewritten as macros - > e.g. ilist_d_has_next, ilist_d_has_prev, ilist_d_is_empty. Since some > things are written as macros anyway maybe it's good to do all the > one-liners that way, although I guess it doesn't matter that much. I find inline functions preferrable because of multiple evaluation stuff. The macros are macros because of the dynamic return type and other similar issues which cannot be done in plain C. > I also agree with your XXX comment that ilist_s_remove should probably > be out of line. Done. Looks good now? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From c3c80925e780489351c4de210364e55d223f02a8 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 6 May 2012 00:26:35 +0200 Subject: [PATCH 1/2] Add embedded list interface Adds a single and a double linked list which can easily embedded into other datastructures and can be used without additional memory allocations. --- src/backend/lib/Makefile |2 +- src/backend/lib/ilist.c | 123 src/include/lib/ilist.h | 468 ++ 3 files changed, 592 insertions(+), 1 deletion(-) create mode 100644 src/backend/lib/ilist.c create mode 100644 src/include/lib/ilist.h diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile index 2e1061e..c94297a 100644 --- a/src/backend/lib/Makefile +++ b/src/backend/lib/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/lib top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -OBJS = dllist.o stringinfo.o +OBJS = dllist.o stringinfo.o ilist.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/lib/ilist.c b/src/backend/lib/ilist.c new file mode 100644 index 000..f78ac51 --- /dev/null +++ b/src/backend/lib/ilist.c @@ -0,0 +1,123 @@ +/*- + * + * ilist.c + * support for integrated/inline double and single linked lists + * + * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/lib/ilist.c + * + * NOTES + * + * This function only contains testing code for inline single/double linked + * lists. + * + *- + */ + +#include "postgres.h" + +/* + * If inlines aren't available include the function as defined in the header, + * but without 'static inline' defined. That way we do not have to duplicate + * their functionality. + */ +#ifndef USE_INLINE +#
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28 June 2012 18:57, Heikki Linnakangas wrote: > FWIW, I think commit_delay is just fine. In practice, it's mostly commits > that are affected, anyway. If we try to be more exact, I think it's just > going to be more confusing to users. You think it will confuse users less if we start telling them to use something that we have a very long history of telling them not to use? The docs say "it is rare that adding delay by increasing this parameter will actually improve performance". The book PostgreSQL high-performance says of commit_delay (and commit_siblings) "These are not effective parameters to tune in most cases. It is extremely difficult to show any speed-up by adjusting them, and quite easy to slow every transaction down by tweaking them". Who would be brave enough to use that in production? Unless you still think these statements are true, and I don't see why you would, it seems like a really bad judgement to not change the name. Is anyone aware of a non-zero commit_delay in the wild today? I personally am not. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Andres Freund writes: > On Thursday, June 28, 2012 08:00:06 PM Tom Lane wrote: >> Well, the permissions angle is actually a good thing here. There is >> pretty much no risk of the mlock succeeding on a box that hasn't been >> specially configured --- and, in most cases, I think you'd need root >> cooperation to raise postgres' RLIMIT_MEMLOCK. So I think we could try >> to mlock without having any effect for 99% of users. The 1% who are >> smart enough to raise the rlimit to something suitable would get better, >> or at least more predictable, performance. > The heightened limit might just as well target at another application and be > setup a bit to widely. I agree that it is useful, but I think it requires its > own setting, defaulting to off. Especially as there are no experiences with > running a larger pg instance that way. [ shrug... ] I think you're inventing things to be afraid of, and ignoring a very real problem that mlock could fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thursday, June 28, 2012 08:00:06 PM Tom Lane wrote: > Andres Freund writes: > > On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote: > >> I think it *would* be a good idea to mlock if we could. Setting shmem > >> large enough that it swaps has always been horrible for performance, > >> and in sysv-land there's no way to prevent that. But we can't error > >> out on permissions failure. > > > > Its also a very good method to get into hard to diagnose OOM situations > > though. Unless the machine is setup very careful and only runs postgres I > > don't think its acceptable to do that. > > Well, the permissions angle is actually a good thing here. There is > pretty much no risk of the mlock succeeding on a box that hasn't been > specially configured --- and, in most cases, I think you'd need root > cooperation to raise postgres' RLIMIT_MEMLOCK. So I think we could try > to mlock without having any effect for 99% of users. The 1% who are > smart enough to raise the rlimit to something suitable would get better, > or at least more predictable, performance. The heightened limit might just as well target at another application and be setup a bit to widely. I agree that it is useful, but I think it requires its own setting, defaulting to off. Especially as there are no experiences with running a larger pg instance that way. Greetings, Andres, for once the conservative one, Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Andres Freund writes: > On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote: >> I think it *would* be a good idea to mlock if we could. Setting shmem >> large enough that it swaps has always been horrible for performance, >> and in sysv-land there's no way to prevent that. But we can't error >> out on permissions failure. > Its also a very good method to get into hard to diagnose OOM situations > though. Unless the machine is setup very careful and only runs postgres I > don't think its acceptable to do that. Well, the permissions angle is actually a good thing here. There is pretty much no risk of the mlock succeeding on a box that hasn't been specially configured --- and, in most cases, I think you'd need root cooperation to raise postgres' RLIMIT_MEMLOCK. So I think we could try to mlock without having any effect for 99% of users. The 1% who are smart enough to raise the rlimit to something suitable would get better, or at least more predictable, performance. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 28.06.2012 15:18, Robert Haas wrote: On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs wrote: 2. Should we rename the GUCs, since this patch will cause them to control WAL flush in general, as opposed to commit specifically? Peter Geoghegan and Simon were arguing that we should retitle it to group_commit_delay rather than just commit_delay, but that doesn't seem to be much of an improvement in describing what the new behavior will actually be, and I am doubtful that it is worth creating a naming incompatibility with previous releases for a cosmetic change. I suggested wal_flush_delay, but there's no consensus on that. Opinions? Again, leave the naming of that for later. The idea of a rename came from yourself, IIRC. Deciding on a name is not such a hard thing that leaving it till later solves any problem. Let's just make a decision and be done with it. FWIW, I think commit_delay is just fine. In practice, it's mostly commits that are affected, anyway. If we try to be more exact, I think it's just going to be more confusing to users. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thursday, June 28, 2012 07:43:16 PM Tom Lane wrote: > Magnus Hagander writes: > > On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund wrote: > >> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote: > >>> What happens if you mlock() it into memory - does that fail quickly? > >>> Is that not something we might want to do *anyway*? > >> > >> You normally can only mlock() mminor amounts of memory without changing > >> settings. Requiring to change that setting (aside that mlocking would be > >> a bad idea imo) would run contrary to the point of the patch, wouldn't > >> it? ;) > > > > It would. I wasn't aware of that limitation :) > > The OSX man page says that mlock should give EAGAIN for a permissions > failure (ie, exceeding the rlimit) but > > [ENOMEM] Some portion of the indicated address range is not > allocated. There was an error faulting/mapping a > page. > > It might be helpful to try mlock (if available, which it isn't > everywhere) and complain about ENOMEM but not other errors. If course, > if the kernel checks rlimit first, we won't learn anything ... > > I think it *would* be a good idea to mlock if we could. Setting shmem > large enough that it swaps has always been horrible for performance, > and in sysv-land there's no way to prevent that. But we can't error > out on permissions failure. Its also a very good method to get into hard to diagnose OOM situations though. Unless the machine is setup very careful and only runs postgres I don't think its acceptable to do that. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Magnus Hagander writes: > On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund wrote: >> On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote: >>> What happens if you mlock() it into memory - does that fail quickly? >>> Is that not something we might want to do *anyway*? >> You normally can only mlock() mminor amounts of memory without changing >> settings. Requiring to change that setting (aside that mlocking would be a >> bad >> idea imo) would run contrary to the point of the patch, wouldn't it? ;) > It would. I wasn't aware of that limitation :) The OSX man page says that mlock should give EAGAIN for a permissions failure (ie, exceeding the rlimit) but [ENOMEM] Some portion of the indicated address range is not allocated. There was an error faulting/mapping a page. It might be helpful to try mlock (if available, which it isn't everywhere) and complain about ENOMEM but not other errors. If course, if the kernel checks rlimit first, we won't learn anything ... I think it *would* be a good idea to mlock if we could. Setting shmem large enough that it swaps has always been horrible for performance, and in sysv-land there's no way to prevent that. But we can't error out on permissions failure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Patch-2 (2-move-continuation-record-to-page-header.patch) WAL Format Changes
On 28.06.2012 17:40, Amit Kapila wrote: 1. @@ -693,7 +693,6 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) { XLogCtlInsert *Insert =&XLogCtl->Insert; XLogRecord *record; -XLogContRecord *contrecord; XLogRecPtrRecPtr; XLogRecPtrWriteRqst; uint32freespace; @@ -1082,9 +1081,7 @@ begin:; curridx = Insert->curridx; /* Insert cont-record header */ Insert->currpage->xlp_info |= XLP_FIRST_IS_CONTRECORD; -contrecord = (XLogContRecord *) Insert->currpos; -contrecord->xl_rem_len = write_len; -Insert->currpos += SizeOfXLogContRecord; +Insert->currpage->xlp_rem_len = write_len; After above code changes the comment "/* Insert cont-record header */" should be changed. Thanks, fixed. 2. Is XLP_FIRST_IS_CONTRECORD required after putting xl_rem_len in page header; Can't we do handling based on xl_rem_len? Hmm, yeah, it's redundant now, we could use xl_rem_len > 0 to indicate a continued record. I thought I'd keep the flag to avoid unnecessary changes, to make life a bit easier for 3rd party tools that read the WAL, but I don't know if it really makes any difference. There is no shortage of xlog page header flag bits, so there's no hurry to get rid of it. Sorry for sending the observations in pieces rather than all-together, as I am not sure how much I will be able to complete. So what ever I am able to read, I am sending you my doubts or observations. Thanks for the review, much appreciated! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 7:27 PM, Andres Freund wrote: > On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote: >> On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas wrote: >> > On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown wrote: >> >> On 64-bit Linux, if I allocate more shared buffers than the system is >> >> capable of reserving, it doesn't start. This is expected, but there's >> >> no error logged anywhere (actually, nothing logged at all), and the >> >> postmaster.pid file is left behind after this failure. >> > >> > Fixed. >> > >> > However, I discovered something unpleasant. With the new code, on >> > MacOS X, if you set shared_buffers to say 3200GB, the server happily >> > starts up. Or at least the shared memory allocation goes through just >> > fine. The postmaster then sits there apparently forever without >> > emitting any log messages, which I eventually discovered was because >> > it's busy initializing a billion or so spinlocks. >> > >> > I'm pretty sure that this machine does not have >3TB of virtual >> > memory, even counting swap. So that means that MacOS X has absolutely >> > no common sense whatsoever as far as anonymous shared memory >> > allocations go. Not sure exactly what to do about that. Linux is >> > more sensible, at least on the system I tested, and fails cleanly. >> >> What happens if you mlock() it into memory - does that fail quickly? >> Is that not something we might want to do *anyway*? > You normally can only mlock() mminor amounts of memory without changing > settings. Requiring to change that setting (aside that mlocking would be a bad > idea imo) would run contrary to the point of the patch, wouldn't it? ;) It would. I wasn't aware of that limitation :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thursday, June 28, 2012 07:19:46 PM Magnus Hagander wrote: > On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas wrote: > > On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown wrote: > >> On 64-bit Linux, if I allocate more shared buffers than the system is > >> capable of reserving, it doesn't start. This is expected, but there's > >> no error logged anywhere (actually, nothing logged at all), and the > >> postmaster.pid file is left behind after this failure. > > > > Fixed. > > > > However, I discovered something unpleasant. With the new code, on > > MacOS X, if you set shared_buffers to say 3200GB, the server happily > > starts up. Or at least the shared memory allocation goes through just > > fine. The postmaster then sits there apparently forever without > > emitting any log messages, which I eventually discovered was because > > it's busy initializing a billion or so spinlocks. > > > > I'm pretty sure that this machine does not have >3TB of virtual > > memory, even counting swap. So that means that MacOS X has absolutely > > no common sense whatsoever as far as anonymous shared memory > > allocations go. Not sure exactly what to do about that. Linux is > > more sensible, at least on the system I tested, and fails cleanly. > > What happens if you mlock() it into memory - does that fail quickly? > Is that not something we might want to do *anyway*? You normally can only mlock() mminor amounts of memory without changing settings. Requiring to change that setting (aside that mlocking would be a bad idea imo) would run contrary to the point of the patch, wouldn't it? ;) Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 7:15 PM, Robert Haas wrote: > On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown wrote: >> On 64-bit Linux, if I allocate more shared buffers than the system is >> capable of reserving, it doesn't start. This is expected, but there's >> no error logged anywhere (actually, nothing logged at all), and the >> postmaster.pid file is left behind after this failure. > > Fixed. > > However, I discovered something unpleasant. With the new code, on > MacOS X, if you set shared_buffers to say 3200GB, the server happily > starts up. Or at least the shared memory allocation goes through just > fine. The postmaster then sits there apparently forever without > emitting any log messages, which I eventually discovered was because > it's busy initializing a billion or so spinlocks. > > I'm pretty sure that this machine does not have >3TB of virtual > memory, even counting swap. So that means that MacOS X has absolutely > no common sense whatsoever as far as anonymous shared memory > allocations go. Not sure exactly what to do about that. Linux is > more sensible, at least on the system I tested, and fails cleanly. What happens if you mlock() it into memory - does that fail quickly? Is that not something we might want to do *anyway*? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 12:13 PM, Thom Brown wrote: > On 64-bit Linux, if I allocate more shared buffers than the system is > capable of reserving, it doesn't start. This is expected, but there's > no error logged anywhere (actually, nothing logged at all), and the > postmaster.pid file is left behind after this failure. Fixed. However, I discovered something unpleasant. With the new code, on MacOS X, if you set shared_buffers to say 3200GB, the server happily starts up. Or at least the shared memory allocation goes through just fine. The postmaster then sits there apparently forever without emitting any log messages, which I eventually discovered was because it's busy initializing a billion or so spinlocks. I'm pretty sure that this machine does not have >3TB of virtual memory, even counting swap. So that means that MacOS X has absolutely no common sense whatsoever as far as anonymous shared memory allocations go. Not sure exactly what to do about that. Linux is more sensible, at least on the system I tested, and fails cleanly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera wrote: > > Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012: > >> When this came up a couple weeks ago, the argument that was made for it >> was that you could attach non-significant columns to an index that *is* >> unique. That might or might not be a wide enough use-case to justify >> adding such a horrid kludge. > > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. That seems like an easy question to answer. How could it not disable HOT and still work correctly? > That could be a > significant difference. True, adding the covering column would not always be a win. But surely it more likely to be a win when it can be done without adding yet another index that also needs to be maintained. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 8:26 AM, Robert Haas wrote: > 3. Consider adjusting the logic inside initdb. If this works > everywhere, the code for determining how to set shared_buffers should > become pretty much irrelevant. Even if it only works some places, we > could add 64MB or 128MB or whatever to the list of values we probe, so > that people won't get quite such a sucky configuration out of the box. > Of course there's no number here that will be good for everyone. This seems independent of the type of shared memory used and the limits on it. If it tried and 64MB or 128MB and discovered that it couldn't obtain that much shared memory, it automatically climbs down to smaller values until it finds one that works. I think the impediment to adopting larger defaults is not what happens if it can't get that much shared memory, but rather what happens if the machine doesn't have that much physical memory. The test server will still start (and so there will be no climb-down), leaving a default which is valid but just has horrid performance. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_signal_backend() asymmetry
On Thu, Jun 28, 2012 at 6:48 AM, Noah Misch wrote: > On Thu, Jun 28, 2012 at 01:36:49AM -0700, Daniel Farina wrote: >> On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt wrote: >> > I have one nitpick related to the recent changes for >> > pg_cancel_backend() and pg_terminate_backend(). If you use these >> > functions as an unprivileged user, and try to signal a nonexistent >> > PID, you get: >> >> I think the goal there is to avoid leakage of the knowledge or >> non-knowledge of a given PID existing once it is deemed out of >> Postgres' control. Although I don't have a specific attack vector in >> mind for when one knows a PID exists a-priori, it does seem like an >> unnecessary admission on the behalf of other programs. > > I think it was just an oversight. I agree that these functions have no > business helping users probe for live non-PostgreSQL PIDs on the server, but > they don't do so and Josh's patch won't change that. I recommend committing > the patch. Users will be able to probe for live PostgreSQL PIDs, but > pg_stat_activity already provides those. Yeah, I figured that since pg_stat_activity already shows users PIDs, there should be no need to have pg_signal_backend() lie about whether a PID was a valid backend or not. And if for some reason we did want to lie, we should give an accurate message like "not valid backend, or must be superuser or have the same role...". I noticed that I neglected to update the comment which was in the non-superuser case: updated version attached. On Thu, Jun 28, 2012 at 5:22 AM, Magnus Hagander wrote: > Well, there are two sides to it - one is the message, the other is if > it should be ERROR or WARNING. My reading is that Josh is suggesting > we make them WARNING in both cases, right? Maybe I should have said I had "a few" nitpicks instead of just "one" :-) A more detailed summary of the little issues I'm aiming to fix: 1a.) SIGNAL_BACKEND_NOPERMISSION being used for both invalid PID and role mismatch, causing the "must be superuser or have ..." message to be printed in both cases for non-superusers 1b.) Since SIGNAL_BACKEND_NOPERMISSION is used for both duties, you'll get an ERROR instead of just a WARNING during plausibly-normal usage. For example, if you're running SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE usename = CURRENT_USER AND pid != pg_backend_pid(); you might be peeved if it ERROR'ed out with the bogus message claiming the process was owned by someone else, when the backend has just exited on its own. So even if we thought it was worth hiding to non-superusers whether the PID is invalid or belongs to someone else, I think the message should just be at WARNING level. 2a.) Existing code uses two calls to BackendPidGetProc() for non-superusers in the error-free case. 2b.) I think the comment "the check for whether it's a backend process is below" is a little misleading, since IsBackendPid() is just checking whether the return of BackendPidGetProc() is NULL, which has already been done. 3.) grammar fix for comment ("on it's own" -> "on its own") Josh pg_signal_backend_asymmetry.v2.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
Alvaro Herrera writes: > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. Surely it would *have* to, whether the columns are significant or not for uniqueness purposes. Else an index-only scan gives the wrong value after the update. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera wrote: > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. That could be a > significant difference. I don't see Index-Only-Scans being something that will be used in "high churn" tables. So as long as the value of these "covering/included" fields is tied to index-only scans, maybe it isn't a problem? Of course, we have have a hard time convincing people that the "index only" scans they want can't be "index only" because heap pages aren't "all visible"... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] embedded list v2
On Tue, Jun 26, 2012 at 7:26 PM, Andres Freund wrote: > Attached are three patches: > 1. embedded list implementation > 2. make the list implementation usable without USE_INLINE > 3. convert all callers to ilist.h away from dllist.h This code doesn't follow PostgreSQL coding style guidelines; in a function definition, the name must start on its own line. Also, stuff like this is grossly unlike what's done elsewhere; use the same formatting as e.g. foreach: +#define ilist_d_reverse_foreach(name, ptr) for(name = (ptr)->head.prev;\ + name != &(ptr)->head;\ + name = name->prev) And this is definitely NOT going to survive pgindent: +for(cur = head->head.prev; +cur != &head->head; +cur = cur->prev){ + if(!(cur) || + !(cur->next) || + !(cur->prev) || + !(cur->prev->next == cur) || + !(cur->next->prev == cur)) + { + elog(ERROR, "double linked list is corrupted"); + } +} And this is another meme we don't use elsewhere; add an explicit NULL test: + while ((cur = last->next)) And then there's stuff like this: + if(!cur){ + elog(ERROR, "single linked list is corrupted"); + } Aside from the formatting issues, I think it would be sensible to preserve the terminology of talking about the "head" and "tail" of a list that we use elsewhere, instead of calling them the "front" and "back" as you've done here. I would suggest that instead of add_after and add_before you use the names insert_after and insert_before, which I think is more clear; also instead of remove, I think you should say delete, for consistency with pg_list.h. A number of these inlined functions could be rewritten as macros - e.g. ilist_d_has_next, ilist_d_has_prev, ilist_d_is_empty. Since some things are written as macros anyway maybe it's good to do all the one-liners that way, although I guess it doesn't matter that much. I also agree with your XXX comment that ilist_s_remove should probably be out of line. Apart from the above, mostly fairly superficial concerns I think this makes sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On 28 June 2012 16:26, Robert Haas wrote: > On Thu, Jun 28, 2012 at 10:11 AM, Tom Lane wrote: >> ... btw, I rather imagine that Robert has already noticed this, but OS X >> (and presumably other BSDen) spells the flag "MAP_ANON" not >> "MAP_ANONYMOUS". I also find this rather interesting flag there: >> >> MAP_HASSEMAPHORE Notify the kernel that the region may contain sema- >> phores and that special handling may be necessary. >> >> By "semaphore" I suspect they mean "spinlock", so we'd better turn this >> flag on where it exists. > > Sounds fine to me. Since no one seems opposed to the basic approach, > and everyone (I assume) will be happier to reduce the impact of > dealing with shared memory limits, I went ahead and committed a > cleaned-up version of the previous patch. Let's see what the > build-farm thinks. > > Assuming things go well, there are a number of follow-on things that > we need to do finish this up: > > 1. Update the documentation. I skipped this for now, because I think > that what we write there is going to be heavily dependent on how > portable this turns out to be, which we don't know yet. Also, it's > not exactly clear to me what the documentation should say if this does > turn out to work everywhere. Much of section 17.4 will become > irrelevant to most users, but I doubt we'd just want to remove it; it > could still matter for people running EXEC_BACKEND or running many > postmasters on the same machine or, of course, people running on > platforms where this just doesn't work, if there are any. > > 2. Update the HINT messages when shared memory allocation fails. > Maybe the new most-common-failure mode there will be too many > postmasters running on the same machine? We might need to wait for > some field reports before adjusting this. > > 3. Consider adjusting the logic inside initdb. If this works > everywhere, the code for determining how to set shared_buffers should > become pretty much irrelevant. Even if it only works some places, we > could add 64MB or 128MB or whatever to the list of values we probe, so > that people won't get quite such a sucky configuration out of the box. > Of course there's no number here that will be good for everyone. > > and of course > > 4. Fix any platforms that are now horribly broken. On 64-bit Linux, if I allocate more shared buffers than the system is capable of reserving, it doesn't start. This is expected, but there's no error logged anywhere (actually, nothing logged at all), and the postmaster.pid file is left behind after this failure. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012: > When this came up a couple weeks ago, the argument that was made for it > was that you could attach non-significant columns to an index that *is* > unique. That might or might not be a wide enough use-case to justify > adding such a horrid kludge. The other question is whether such an index would prevent an update from being HOT when the non-indexed values are touched. That could be a significant difference. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
Andrew Dunstan writes: > On 06/28/2012 11:37 AM, Jeff Janes wrote: >> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler >> wrote: >>> I'm particularly intrigued by "covering indexes". For example: >>> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); >> I don't see the virtue of this in this case. Since the index is not >> unique, why not just put the index on (a,b,c,d) and be done with it? > Presumably the comparison function will be faster the fewer attributes > it needs to compare. Well, no, because queries will only be comparing the columns used in the query. Insertions would likely actually be faster with the extra columns considered significant, since we've known for a long time that btree doesn't especially like large numbers of identical index entries. When this came up a couple weeks ago, the argument that was made for it was that you could attach non-significant columns to an index that *is* unique. That might or might not be a wide enough use-case to justify adding such a horrid kludge. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
On Thursday, June 28, 2012 06:01:10 PM Robert Haas wrote: > On Tue, Jun 26, 2012 at 8:13 PM, Andres Freund wrote: > > It even can be significantly higher than max_connections because > > subtransactions are only recognizable as part of their parent transaction > > uppon commit. > > I've been wondering whether sub-XID assignment was going to end up on > the list of things that need to be WAL-logged to enable logical > replication. It would be nicer to avoid that if we can, but I have a > feeling that we may not be able to. I don't think it needs to. We only need that information during commit and we have it there. If a subtxn aborts a separate abort is logged, so thats no problem. The 'merging' of the transactions would be slightly easier if we had the knowledge from the get go but that would add complications again in the case of rollbacks. What do you think we need it? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
On Tue, Jun 26, 2012 at 8:13 PM, Andres Freund wrote: > It even can be significantly higher than max_connections because > subtransactions are only recognizable as part of their parent transaction > uppon commit. I've been wondering whether sub-XID assignment was going to end up on the list of things that need to be WAL-logged to enable logical replication. It would be nicer to avoid that if we can, but I have a feeling that we may not be able to. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
On 06/28/2012 11:37 AM, Jeff Janes wrote: On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler wrote: Hackers, Very interesting design document for SQLite 4: http://www.sqlite.org/src4/doc/trunk/www/design.wiki I'm particularly intrigued by "covering indexes". For example: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); I don't see the virtue of this in this case. Since the index is not unique, why not just put the index on (a,b,c,d) and be done with it? Is there some advantage to be had in inventing a way to store c and d in the index without having them usable for indexing? Presumably the comparison function will be faster the fewer attributes it needs to compare. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
Florian Pflug writes: > On Jun28, 2012, at 17:29 , Tom Lane wrote: >> I believe it works today, because the executor only applies permissions >> checks during query startup. So those checks are executed while still >> within the SECURITY DEFINER context, and should behave as expected. >> Subsequently, the cursor portal is returned to caller and caller can >> execute it to completion, no problem. > Don't we (sometimes?) defer query startup to the first time FETCH is > called? There are things inside individual plan node functions that may only happen when the first row is demanded, but permissions checks are done in ExecutorStart(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler wrote: > Hackers, > > Very interesting design document for SQLite 4: > > http://www.sqlite.org/src4/doc/trunk/www/design.wiki > > I'm particularly intrigued by "covering indexes". For example: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); I don't see the virtue of this in this case. Since the index is not unique, why not just put the index on (a,b,c,d) and be done with it? Is there some advantage to be had in inventing a way to store c and d in the index without having them usable for indexing? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
On Jun28, 2012, at 17:29 , Tom Lane wrote: > Kohei KaiGai writes: >> 2012/6/27 Florian Pflug : >>> Hm, what happens if a SECURITY DEFINER functions returns a refcursor? > >> My impression is, here is no matter even if SECURITY DEFINER function >> returns refcursor. > > I think Florian has a point: it *should* work, but *will* it? > > I believe it works today, because the executor only applies permissions > checks during query startup. So those checks are executed while still > within the SECURITY DEFINER context, and should behave as expected. > Subsequently, the cursor portal is returned to caller and caller can > execute it to completion, no problem. Don't we (sometimes?) defer query startup to the first time FETCH is called? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
Kohei KaiGai writes: > 2012/6/27 Florian Pflug : >> Hm, what happens if a SECURITY DEFINER functions returns a refcursor? > My impression is, here is no matter even if SECURITY DEFINER function > returns refcursor. I think Florian has a point: it *should* work, but *will* it? I believe it works today, because the executor only applies permissions checks during query startup. So those checks are executed while still within the SECURITY DEFINER context, and should behave as expected. Subsequently, the cursor portal is returned to caller and caller can execute it to completion, no problem. However, with RLS security-related checks will happen throughout the execution of the portal. They might do the wrong thing once the SECURITY DEFINER function has been exited. We might need to consider that a portal has a local value of "current_user", which is kind of a pain, but probably doable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux
On Thu, Jun 28, 2012 at 11:21 AM, Jeff Janes wrote: > Also, 20 transactions per connection is not enough of a run to make > any evaluation on. FWIW, I kicked off a looong benchmarking run on this a couple of days ago on the IBM POWER7 box, testing pgbench -S, regular pgbench, and pgbench --unlogged-tables at various client counts with and without the patch; three half-hour test runs for each test configuration. It should be done tonight and I will post the results once they're in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 10:11 AM, Tom Lane wrote: > ... btw, I rather imagine that Robert has already noticed this, but OS X > (and presumably other BSDen) spells the flag "MAP_ANON" not > "MAP_ANONYMOUS". I also find this rather interesting flag there: > > MAP_HASSEMAPHORE Notify the kernel that the region may contain sema- > phores and that special handling may be necessary. > > By "semaphore" I suspect they mean "spinlock", so we'd better turn this > flag on where it exists. Sounds fine to me. Since no one seems opposed to the basic approach, and everyone (I assume) will be happier to reduce the impact of dealing with shared memory limits, I went ahead and committed a cleaned-up version of the previous patch. Let's see what the build-farm thinks. Assuming things go well, there are a number of follow-on things that we need to do finish this up: 1. Update the documentation. I skipped this for now, because I think that what we write there is going to be heavily dependent on how portable this turns out to be, which we don't know yet. Also, it's not exactly clear to me what the documentation should say if this does turn out to work everywhere. Much of section 17.4 will become irrelevant to most users, but I doubt we'd just want to remove it; it could still matter for people running EXEC_BACKEND or running many postmasters on the same machine or, of course, people running on platforms where this just doesn't work, if there are any. 2. Update the HINT messages when shared memory allocation fails. Maybe the new most-common-failure mode there will be too many postmasters running on the same machine? We might need to wait for some field reports before adjusting this. 3. Consider adjusting the logic inside initdb. If this works everywhere, the code for determining how to set shared_buffers should become pretty much irrelevant. Even if it only works some places, we could add 64MB or 128MB or whatever to the list of values we probe, so that people won't get quite such a sucky configuration out of the box. Of course there's no number here that will be good for everyone. and of course 4. Fix any platforms that are now horribly broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux
On Tue, Jun 26, 2012 at 3:58 PM, Nils Goroll wrote: >> It's >> still unproven whether it'd be an improvement, but you could expect to >> prove it one way or the other with a well-defined amount of testing. > > I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see > attached patch. The patch is for the git head, but it can easily be applied > for > 9.1.3, which is what I did for my tests. > > This had disastrous effects on Solaris because it does not use anything > similar > to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do > without syscalls for the simple case). > > But I was surprised to see that it works relatively well on linux. Here's a > glimpse of my results: > > hacked code 9.1.3: ... > tps = 485.964355 (excluding connections establishing) > original code (vanilla build on amd64) 9.1.3: ... > tps = 510.410883 (excluding connections establishing) It looks like the hacked code is slower than the original. That doesn't seem so good to me. Am I misreading this? Also, 20 transactions per connection is not enough of a run to make any evaluation on. How many cores are you testing on? > Regarding the actual production issue, I did not manage to synthetically > provoke > the saturation we are seeing in production using pgbench - I could not even > get > anywhere near the production load. What metrics/tools are you using to compare the two loads? What is the production load like? Each transaction has to update one of ten pgbench_branch rows, so you can't have more than ten transactions productively active at any given time, even though you have 768 connections. So you need to jack up the pgbench scale, or switch to using -N mode. Also, you should use -M prepared, otherwise you spend more time parsing and planning the statements than executing them. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
2012/6/27 Florian Pflug : > On Jun27, 2012, at 15:07 , Kohei KaiGai wrote: >> Probably, PlannedStmt->invalItems allows to handle invalidation of >> plan-cache without big code changes. I'll try to put a flag of user-id >> to track the query plan with RLS assumed, or InvalidOid if no RLS >> was applied in this plan. >> I'll investigate the implementation for more details. >> >> Do we have any other scenario that run a query plan under different >> user privilege rather than planner stage? > > Hm, what happens if a SECURITY DEFINER functions returns a refcursor? > > Actually, I wonder how we handle that today. If the executor is > responsible for permission checks, that wouldn't we apply the calling > function's privilege level in that case, at least of the cursor isn't > fetched from in the SECURITY DEFINER function? If I find some time, > I'll check... > My impression is, here is no matter even if SECURITY DEFINER function returns refcursor. A SECURITY DEFINER function (or Trusted Procedure on sepgsql, or Set-UID program on Linux) provides unprivileged users a particular "limited way" to access protected data. It means owner of the security definer function admits it is reasonable to show the protected data as long as unprivileged users access them via the function. It is same reason why we admit view's access for users who have privileges on views but unprivileged to underlying tables. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch-2 (2-move-continuation-record-to-page-header.patch) WAL Format Changes
While reading patch-2 (2-move-continuation-record-to-page-header.patch) of WAL Format Changes(http://archives.postgresql.org/message-id/4FDA5136.6080206@enterpris edb.com), I had few observations which are summarized below: 1. @@ -693,7 +693,6 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) { XLogCtlInsert *Insert = &XLogCtl->Insert; XLogRecord *record; -XLogContRecord *contrecord; XLogRecPtrRecPtr; XLogRecPtrWriteRqst; uint32freespace; @@ -1082,9 +1081,7 @@ begin:; curridx = Insert->curridx; /* Insert cont-record header */ Insert->currpage->xlp_info |= XLP_FIRST_IS_CONTRECORD; -contrecord = (XLogContRecord *) Insert->currpos; -contrecord->xl_rem_len = write_len; -Insert->currpos += SizeOfXLogContRecord; +Insert->currpage->xlp_rem_len = write_len; After above code changes the comment "/* Insert cont-record header */" should be changed. 2. Is XLP_FIRST_IS_CONTRECORD required after putting xl_rem_len in page header; Can't we do handling based on xl_rem_len? Sorry for sending the observations in pieces rather than all-together, as I am not sure how much I will be able to complete. So what ever I am able to read, I am sending you my doubts or observations. With Regards, Amit Kapila.
Re: [HACKERS] Posix Shared Mem patch
... btw, I rather imagine that Robert has already noticed this, but OS X (and presumably other BSDen) spells the flag "MAP_ANON" not "MAP_ANONYMOUS". I also find this rather interesting flag there: MAP_HASSEMAPHORE Notify the kernel that the region may contain sema- phores and that special handling may be necessary. By "semaphore" I suspect they mean "spinlock", so we'd better turn this flag on where it exists. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers reduced, v1
On Thu, Jun 28, 2012 at 9:46 AM, Robert Haas wrote: > [ review ] Also: ../../../src/include/catalog/pg_event_trigger.h:34: error: expected specifier-qualifier-list before ‘int2’ This needs to be changed to int16 as a result of commit b8b2e3b2deeaab19715af063fc009b7c230b2336. alter.c:73: warning: implicit declaration of function ‘RenameEventTrigger’ That file needs to include commands/event_trigger.h. Please spell out DO_EVTTRIGGER as DO_EVENT_TRIGGER. Another random warning: plpy_main.c:195: warning: ‘retval’ may be used uninitialized in this function -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 8:57 AM, Robert Haas wrote: > On Thu, Jun 28, 2012 at 9:47 AM, Jon Nelson wrote: >> Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)? I >> seem to think that's what I did when I needed this functionality oh so >> many moons ago. > > From the reading I've done on this topic, that seems to be a trick > invented on Solaris that is considered grotty and awful by everyone > else. The thing is that you want the mapping to be shared with the > processes that inherit the mapping from you. You do *NOT* want the > mapping to be shared with EVERYONE who has mapped that file for any > reason, which is the usual meaning of MAP_SHARED on a file. Maybe > this happens to work correctly on some or all platforms, but I would > want to have some convincing evidence that it's more widely supported > (with the correct semantics) than MAP_ANON before relying on it. When I did this (I admit, it was on Linux but it was a long time ago) only the inherited file descriptor + mmap structure mattered - modifications were private to the process and it's children - other apps always saw their "own" /dev/zero. A quick google suggests that - according to qnx, sco, and some others - mmap'ing /dev/zero retains the expected privacy. Given how /dev/zero works I'd be very surprised if it was otherwise. I would love to see links that suggest that /dev/zero is nasty (or, in fact, in any way fundamentally different than mmap'ing /dev/zero) - feel free to send them to me privately to avoid polluting the list. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Magnus Hagander writes: > On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas wrote: >> A related question is - if we do this - should we enable it only on >> ports where we've verified that it works, or should we just turn it on >> everywhere and fix breakage if/when it's reported? I lean toward the >> latter. > Depends on the amount of expected breakage, but I'd lean towards teh > later as well. If we don't turn it on, we won't find out whether it works. I'd say try it first and then back off if that proves necessary. I'd just as soon not see us write any fallback logic without evidence that it's needed. FWIW, even my pet dinosaur HP-UX 10.20 box appears to support mmap(MAP_SHARED|MAP_ANONYMOUS) --- at least the mmap man page documents both flags. I find it really pretty hard to believe that there are any machines out there that haven't got this and yet might be expected to run PG 9.3+. We should not go into it with an expectation of failure, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 9:47 AM, Jon Nelson wrote: > Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)? I > seem to think that's what I did when I needed this functionality oh so > many moons ago. From the reading I've done on this topic, that seems to be a trick invented on Solaris that is considered grotty and awful by everyone else. The thing is that you want the mapping to be shared with the processes that inherit the mapping from you. You do *NOT* want the mapping to be shared with EVERYONE who has mapped that file for any reason, which is the usual meaning of MAP_SHARED on a file. Maybe this happens to work correctly on some or all platforms, but I would want to have some convincing evidence that it's more widely supported (with the correct semantics) than MAP_ANON before relying on it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Thu, Jun 28, 2012 at 9:51 AM, Tom Lane wrote: > You are inventing problem details to fit > your solution. Well, what I'm actually doing is assuming that Josh's customers have the same problem that our customers do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Robert Haas writes: > On Thu, Jun 28, 2012 at 2:02 AM, Tom Lane wrote: >> Well, that's a fair point, but I don't think it has anything to do with >> Josh's complaint --- which AFAICT is about imposed load, not about >> failure to vacuum things that need vacuumed. > I think it's got everything to do with it. Josh could fix his problem > by increasing the cost limit and/or reducing the cost delay, but if he > did that then his database would get bloated... Josh hasn't actually explained what his problem is, nor what if any adjustments he made to try to ameliorate it. In the absence of data I refuse to rule out misconfiguration. But, again, to the extent that he's given us any info at all, it seemed to be a complaint about oversaturated I/O at max load, *not* about inability to complete vacuuming tasks as needed. You are inventing problem details to fit your solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Review] Add SPI_gettypmod() to return a field's typemod from a TupleDesc
On Mon, Jun 18, 2012 at 3:29 PM, Amit Kapila wrote: > [ review ] Chetan, this patch is waiting for an update from you. If you'd like this to get committed this CommitFest, we'll need an updated patch soon. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_signal_backend() asymmetry
On Thu, Jun 28, 2012 at 01:36:49AM -0700, Daniel Farina wrote: > On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt wrote: > > I have one nitpick related to the recent changes for > > pg_cancel_backend() and pg_terminate_backend(). If you use these > > functions as an unprivileged user, and try to signal a nonexistent > > PID, you get: > > I think the goal there is to avoid leakage of the knowledge or > non-knowledge of a given PID existing once it is deemed out of > Postgres' control. Although I don't have a specific attack vector in > mind for when one knows a PID exists a-priori, it does seem like an > unnecessary admission on the behalf of other programs. I think it was just an oversight. I agree that these functions have no business helping users probe for live non-PostgreSQL PIDs on the server, but they don't do so and Josh's patch won't change that. I recommend committing the patch. Users will be able to probe for live PostgreSQL PIDs, but pg_stat_activity already provides those. > Also, in pg_cancel_backend et al, PID really means "database session", > but as-is the marrying of PID and session is one of convenience, so I > think any message that communicates more than "that database session > does not exist" is superfluous anyhow. Perhaps there is a better > wording for the time being that doesn't implicate the existence or > non-existence of the PID? Perhaps, though I'm not coming up with anything. The message isn't wrong; the value is a PID independent of whether some process has that PID. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 6:05 AM, Magnus Hagander wrote: > On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas wrote: >> On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane wrote: >>> Robert Haas writes: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane wrote: > Would Posix shmem help with that at all? Why did you choose not to > use the Posix API, anyway? >>> It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. >>> >>> I see. Those are pretty good reasons ... >> >> So, should we do it this way? >> >> I did a little research and discovered that Linux 2.3.51 (released >> 3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS. >> That combination is documented to work beginning in Linux 2.4.0. How >> worried should we be about people trying to run PostgreSQL 9.3 on >> pre-2.4 kernels? If we want to worry about it, we could try mapping a >> one-page shared MAP_SHARED|MAP_ANONYMOUS segment first. If that >> works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS >> facility and try to allocate the whole segment plus a minimal sysv >> shm. If the single page allocation fails with EINVAL, we could fall >> back to allocating the entire segment as sysv shm. Why not just mmap /dev/zero (MAP_SHARED but not MAP_ANONYMOUS)? I seem to think that's what I did when I needed this functionality oh so many moons ago. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers reduced, v1
On Sun, Jun 24, 2012 at 5:46 PM, Dimitri Fontaine wrote: > Here's an early revision 2 of the patch, not yet ready for commit, so > including the PL stuff still. What's missing is mainly a cache reference > leak to fix at DROP EVENT TRIGGER, but I failed to spot where it comes > from. > > As I fixed about all the other comments I though I might as well send in > the new version of the patch to get to another round of review. Some of the pg_dump hunks fail to apply for me; I guess this needs to be remerged. Spurious hunk: -query_hosts +query_hosts Spurious hunk: - * need deep copies, so they should be copied via list_copy() + * need deep copies, so they should be copied via list_copy(const ) There are a few remaining references to EVTG_FIRED_BEFORE / EVTG_FIRED_INSTEAD_OF which should be cleaned up. I suggest writing the grammar as CREATE EVENT TRIGGER name ON event_name EXECUTE... On a related note, evttype is still mentioned in the documentation, also. And CreateEventTrigStmt has a char timing field that can go. Incidentally, why do we not support an argument list here, as with ordinary triggers? I think the below hunk should get removed. Or else it should be surrounded by #ifdef rather than commented out. + /* +* Useful to raise WARNINGs for any DDL command not yet supported. +* + elog(WARNING, "Command Tag:%s", tag); + elog(WARNING, "Note to String: %s", nodeToString(parsetree)); +*/ Typo: + * where we probide object name and namespace separately and still want nice Typo: + * the easier code makes up fot it big time. psql is now very confused about what the slash command is. It's actually implemented as \dy, but the comments say \dev in several places, and slashUsage() documents it as \dct. InitializeEvtTriggerCommandCache still needs work. First, it ensures that CacheMemoryContext is non-NULL... after it's already stored the value of CacheMemoryContext into the HASHCTL. Obviously, the order there needs to be fixed. Also, I think you need to split this into two functions. There should be one function that gets called just once at startup time to CacheRegisterSyscacheCallback(), because we don't want to redo that every time the cache gets blown away. Then there should be another function that gets called when, and only when, someone needs to use the cache. That should create and populate the hash table. I think that all event triggers should fire in exactly alphabetical order by name. Now that we have the concept of multiple firing points, there's really no reason to distinguish between any triggers and triggers on specific commands. Eliminating that distinction will eliminate a bunch of complexity while making things *more* flexible for the user, who will be able to get a command trigger for a specific command to fire either before or after an ANY command trigger he has also defined rather than having the system enforce policy on him. plperl_validator still makes reference to CMDTRIGGER. Let's simplify this down to an enum with just one element, since that's all we're going to support for starters, and remove the related parser support for everything but command_start: +typedef enum TrigEvent +{ + E_CommandStart = 1, + E_SecurityCheck = 10, + E_ConsistencyCheck = 15, + E_NameLookup = 20, + E_CommandEnd = 51 +} TrigEvent; I think we should similarly rip out the vestigial support for supplying schema name, object name, and object ID. That's not going to be possible at command_start anyway; we can include that stuff in the patch that adds a later firing point (command end, or consistency check, perhaps). I think standard_ProcessUtility should have a shortcut that bypasses setting up the event context if there are no event triggers at all in the entire system, so that we do no extra work unless there's some potential benefit. It seems to me that we probably need a CommandCounterIncrement() after firing each event trigger, unless that's happening under the covers somewhere and I'm missing it. A good test case would be to have two event triggers. Have the first one insert a row into the table and check that the second one can see the row there. I suggest adding something like this to the regression tests. Instead of having giant switch statements match E_WhatEver tags to strings and visca versa, I think it would be much better to create an array someplace that contains all the mappings. Then you can write a convenience function that scans the array for a string and returns the E_WhatEver tag, and another convenience function that scans the array for an E_WhatEver tag and returns the corresponding string. Then all the knowledge of how those things map onto each other is in ONE place, which should make things a whole lot easier in terms of future code maintenance, not to mention minimizing the chances of bugs of oversight in the patch as it stands. :-) The
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
> >> Parallelism is not free, ever, and particularly not here, where it has > >> the potential to yank the disk head around between five different > >> files, seeking like crazy, instead of a nice sequential I/O pattern on > >> each file in turn. > > > > Interesting point. Maybe what's going on here is that > > autovac_balance_cost() is wrong to suppose that N workers can each have > > 1/N of the I/O bandwidth that we'd consider okay for a single worker to > > eat. Maybe extra seek costs mean we have to derate that curve by some > > large factor. 1/(N^2), perhaps? I bet the nature of the disk subsystem > > affects this a lot, though. > > ...and this would have the same effect. Let's not assume that the > problem is that Josh doesn't know how to make autovacuum less > aggressive, because I'm pretty sure that ain't the issue. we may need reserved workers to work on system tables, at least. Just as a protection in case all workers all locked hours walking 'log' tables. In the meantime, the pg_type table can bloat a lot for ex. It might be that limiting the number of workers in 'antiwraparound-mode' to (max_workers - round(max_workers/3)) is enough. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Covering Indexes
On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler wrote: > Hackers, > > Very interesting design document for SQLite 4: > > http://www.sqlite.org/src4/doc/trunk/www/design.wiki > > I'm particularly intrigued by "covering indexes". For example: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > This allows the following query to do an index-only scan: > > SELECT c, d FROM table1 WHERE a=? AND b=?; > > Now that we have index-only scans in 9.2, I'm wondering if it would make > sense to add covering index support, too, where additional, unindexed columns > are stored alongside indexed columns. > > And I wonder if it would work well with expressions, too? > > David IRC MS SQL also allow unindexed columns in the index. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Covering Indexes
On 06/28/2012 02:16 PM, David E. Wheeler wrote: Hackers, Very interesting design document for SQLite 4: http://www.sqlite.org/src4/doc/trunk/www/design.wiki I'm particularly intrigued by "covering indexes". For example: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); This allows the following query to do an index-only scan: SELECT c, d FROM table1 WHERE a=? AND b=?; Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns. And I wonder if it would work well with expressions, too? David This is analogous to SQL Server's "include" : |CREATE NONCLUSTERED INDEX my_idx| |ON my_table (status)| |INCLUDE (someColumn, otherColumn)| Which is useful, but bloats the index. -- Andreas Joseph Krogh - mob: +47 909 56 963 Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no Public key: http://home.officenet.no/~andreak/public_key.asc
Re: [HACKERS] pg_signal_backend() asymmetry
On Thu, Jun 28, 2012 at 10:36 AM, Daniel Farina wrote: > On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt wrote: >> Hi all, >> >> I have one nitpick related to the recent changes for >> pg_cancel_backend() and pg_terminate_backend(). If you use these >> functions as an unprivileged user, and try to signal a nonexistent >> PID, you get: > > I think the goal there is to avoid leakage of the knowledge or > non-knowledge of a given PID existing once it is deemed out of > Postgres' control. Although I don't have a specific attack vector in > mind for when one knows a PID exists a-priori, it does seem like an > unnecessary admission on the behalf of other programs. Well, there are two sides to it - one is the message, the other is if it should be ERROR or WARNING. My reading is that Josh is suggesting we make them WARNING in both cases, right? It should be possible to make it a WARNING without leaking information in the error message.. > Also, in pg_cancel_backend et al, PID really means "database session", > but as-is the marrying of PID and session is one of convenience, so I > think any message that communicates more than "that database session > does not exist" is superfluous anyhow. Perhaps there is a better > wording for the time being that doesn't implicate the existence or > non-existence of the PID? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Thu, Jun 28, 2012 at 2:02 AM, Tom Lane wrote: > Robert Haas writes: >> It's just ridiculous to assert that it doesn't matter if all the >> anti-wraparound vacuums start simultaneously. It does matter. For >> one thing, once every single autovacuum worker is pinned down doing an >> anti-wraparound vacuum of some table, then a table that needs an >> ordinary vacuum may have to wait quite some time before a worker is >> available. > > Well, that's a fair point, but I don't think it has anything to do with > Josh's complaint --- which AFAICT is about imposed load, not about > failure to vacuum things that need vacuumed. Any scheme you care to > design will sometimes be running max_workers workers at once, and if > that's too much load there will be trouble. I grant that there can be > value in a more complex strategy for when to schedule vacuuming > activities, but I don't think that it has a lot to do with solving the > present complaint. I think it's got everything to do with it. Josh could fix his problem by increasing the cost limit and/or reducing the cost delay, but if he did that then his database would get bloated... >> Parallelism is not free, ever, and particularly not here, where it has >> the potential to yank the disk head around between five different >> files, seeking like crazy, instead of a nice sequential I/O pattern on >> each file in turn. > > Interesting point. Maybe what's going on here is that > autovac_balance_cost() is wrong to suppose that N workers can each have > 1/N of the I/O bandwidth that we'd consider okay for a single worker to > eat. Maybe extra seek costs mean we have to derate that curve by some > large factor. 1/(N^2), perhaps? I bet the nature of the disk subsystem > affects this a lot, though. ...and this would have the same effect. Let's not assume that the problem is that Josh doesn't know how to make autovacuum less aggressive, because I'm pretty sure that ain't the issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs wrote: > Let's put this in now, without too much fiddling. There is already a > GUC to disable it, so measurements can be made to see if this causes > any problems. > > If there are problems, we fix them. We can't second guess everything. Fair enough. >> 2. Should we rename the GUCs, since this patch will cause them to >> control WAL flush in general, as opposed to commit specifically? >> Peter Geoghegan and Simon were arguing that we should retitle it to >> group_commit_delay rather than just commit_delay, but that doesn't >> seem to be much of an improvement in describing what the new behavior >> will actually be, and I am doubtful that it is worth creating a naming >> incompatibility with previous releases for a cosmetic change. I >> suggested wal_flush_delay, but there's no consensus on that. >> Opinions? > > Again, leave the naming of that for later. The idea of a rename came > from yourself, IIRC. Deciding on a name is not such a hard thing that leaving it till later solves any problem. Let's just make a decision and be done with it. >> The XLByteLE test four lines from the bottom should happen before we >> consider whether to sleep, because it's possible that we'll discover >> that our flush request has already been satisfied and exit without >> doing anything, in which case the sleep is a waste. The attached >> version adjusts the logic accordingly. > > I thought there already was a test like that earlier in the flush. > > I agree there needs to be one. There are several of them floating around; the point here is just to make sure that the sleep is after all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Covering Indexes
Hackers, Very interesting design document for SQLite 4: http://www.sqlite.org/src4/doc/trunk/www/design.wiki I'm particularly intrigued by "covering indexes". For example: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); This allows the following query to do an index-only scan: SELECT c, d FROM table1 WHERE a=? AND b=?; Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns. And I wonder if it would work well with expressions, too? David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Fix for a small tipo (space lost)
On Thu, Jun 28, 2012 at 4:44 AM, Alexander Lakhin wrote: > Fix for a small tipo (space lost) Thanks! Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 7:05 AM, Magnus Hagander wrote: > Do we really need a runtime check for that? Isn't a configure check > enough? If they *do* deploy postgresql 9.3 on something that old, > they're building from source anyway... [...] > > Could we actually turn *that* into a configure test, or will that be > too complex? I don't see why we *couldn't* make either of those things into a configure test, but it seems more complicated than a runtime test and less accurate, so I guess I'd be in favor of doing them at runtime or not at all. Actually, the try-a-one-page-mapping-and-see-if-you-get-EINVAL test is so simple that I really can't see any reason not to insert that defense. The fork-and-check-whether-it-really-works test is probably excess paranoia until we determine whether that's really a danger anywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Thu, Jun 28, 2012 at 7:00 AM, Robert Haas wrote: > On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane wrote: >> Robert Haas writes: >>> On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane wrote: Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? >> >>> It seemed more complicated. If we use the POSIX API, we've got to >>> have code to find a non-colliding name for the shm, and we've got to >>> arrange to clean it up at process exit. Anonymous shm doesn't require >>> a name and goes away automatically when it's no longer in use. >> >> I see. Those are pretty good reasons ... > > So, should we do it this way? > > I did a little research and discovered that Linux 2.3.51 (released > 3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS. > That combination is documented to work beginning in Linux 2.4.0. How > worried should we be about people trying to run PostgreSQL 9.3 on > pre-2.4 kernels? If we want to worry about it, we could try mapping a > one-page shared MAP_SHARED|MAP_ANONYMOUS segment first. If that > works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS > facility and try to allocate the whole segment plus a minimal sysv > shm. If the single page allocation fails with EINVAL, we could fall > back to allocating the entire segment as sysv shm. Do we really need a runtime check for that? Isn't a configure check enough? If they *do* deploy postgresql 9.3 on something that old, they're building from source anyway... > A related question is - if we do this - should we enable it only on > ports where we've verified that it works, or should we just turn it on > everywhere and fix breakage if/when it's reported? I lean toward the > latter. Depends on the amount of expected breakage, but I'd lean towards teh later as well. > If we find that there are platforms where (a) mmap is not supported or > (b) MAP_SHARED|MAP_ANON works but has the wrong semantics, we could > either shut off this optimization on those platforms by fiat, or we > could test not only that the call succeeds, but that it works > properly: create a one-page mapping and fork a child process; in the > child, write to the mapping and exit; in the parent, wait for the > child to exit and then test that we can read back the correct > contents. This would protect against a hypothetical system where the > flags are accepted but fail to produce the correct behavior. I'm > inclined to think this is over-engineering in the absence of evidence > that there are platforms that work this way. Could we actually turn *that* into a configure test, or will that be too complex? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: Fix for a small tipo (space lost)
Fix for a small tipo (space lost) >From be61035d21512324aafd41074511625d97d17256 Mon Sep 17 00:00:00 2001 From: Alexander Lakhin Date: Thu, 28 Jun 2012 12:10:25 +0400 Subject: Fix for a small tipo (space lost). --- src/backend/utils/misc/guc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 965d325..33b58de 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2843,7 +2843,7 @@ static struct config_string ConfigureNamesString[] = { {"event_source", PGC_POSTMASTER, LOGGING_WHERE, - gettext_noop("Sets the application name used to identify" + gettext_noop("Sets the application name used to identify " "PostgreSQL messages in the event log."), NULL }, -- 1.7.9.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_signal_backend() asymmetry
On Wed, Jun 27, 2012 at 5:38 PM, Josh Kupershmidt wrote: > Hi all, > > I have one nitpick related to the recent changes for > pg_cancel_backend() and pg_terminate_backend(). If you use these > functions as an unprivileged user, and try to signal a nonexistent > PID, you get: I think the goal there is to avoid leakage of the knowledge or non-knowledge of a given PID existing once it is deemed out of Postgres' control. Although I don't have a specific attack vector in mind for when one knows a PID exists a-priori, it does seem like an unnecessary admission on the behalf of other programs. Also, in pg_cancel_backend et al, PID really means "database session", but as-is the marrying of PID and session is one of convenience, so I think any message that communicates more than "that database session does not exist" is superfluous anyhow. Perhaps there is a better wording for the time being that doesn't implicate the existence or non-existence of the PID? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Wed, Jun 27, 2012 at 7:00 PM, Josh Berkus wrote: > I've seen this at two sites now, and my conclusion is that a single > autovacuum_max_workers isn't sufficient if to cover the case of > wraparound vacuum. Nor can we just single-thread the wraparound vacuum > (i.e. just one worker) since that would hurt users who have thousands of > small tables. I have also witnessed very unfortunate un-smooth performance behavior around wraparound time. It seems like a bit of adaptive response in terms of allowed autovacuum throughput to number of pages requiring wraparound vacuuming would be one load off my mind. Getting slower and slower gradually with some way to know that autovacuum has decided it should work harder and harder is better than the brick wall that can sneak up currently. Count me as appreciative for improvements in this area. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 27 June 2012 23:01, Robert Haas wrote: > 1. Are there any call sites from which this should be disabled, either > because the optimization won't help, or because the caller is holding > a lock that we don't want them sitting on for a moment longer than > necessary? I think the worst case is that we're doing something like > splitting the root page of a btree, and now nobody can walk the btree > until the flush finishes, and here we are doing an "unnecessary" > sleep. I am not sure where I can construct a workload where this is a > real as opposed to a theoretical problem, though. So maybe we should > just not worry about it. It suffices for this to be an improvement > over the status quo; it doesn't have to be perfect. Thoughts? Let's put this in now, without too much fiddling. There is already a GUC to disable it, so measurements can be made to see if this causes any problems. If there are problems, we fix them. We can't second guess everything. > 2. Should we rename the GUCs, since this patch will cause them to > control WAL flush in general, as opposed to commit specifically? > Peter Geoghegan and Simon were arguing that we should retitle it to > group_commit_delay rather than just commit_delay, but that doesn't > seem to be much of an improvement in describing what the new behavior > will actually be, and I am doubtful that it is worth creating a naming > incompatibility with previous releases for a cosmetic change. I > suggested wal_flush_delay, but there's no consensus on that. > Opinions? Again, leave the naming of that for later. The idea of a rename came from yourself, IIRC. > The XLByteLE test four lines from the bottom should happen before we > consider whether to sleep, because it's possible that we'll discover > that our flush request has already been satisfied and exit without > doing anything, in which case the sleep is a waste. The attached > version adjusts the logic accordingly. I thought there already was a test like that earlier in the flush. I agree there needs to be one. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers