Re: [HACKERS] Platform-dependent(?) failure in timeout handling
On Tue, Nov 26, 2013 at 06:50:28PM -0500, Tom Lane wrote: I believe the reason for this is the mechanism that I speculated about in that previous thread. The platform is blocking SIGALRM while it executes handle_sig_alarm(), and that calls LockTimeoutHandler() which does kill(MyProcPid, SIGINT), and that SIGINT is being delivered immediately (or at least before we can get out of handle_sig_alarm). So now the platform blocks SIGINT, too, and calls StatementCancelHandler(), which proceeds to longjmp out of the whole signal-handling call stack. So the signal unblocking that would have happened after the handlers returned doesn't happen. In simpler cases we don't see an issue because the longjmp returns to the setsigjmp(foo,1) call in postgres.c, which will result in restoring the signal mask that was active at that stack level, so we're all good. However, PG_TRY() uses setsigjmp(foo,0), which means that no signal mask restoration happens if we catch the longjmp and don't ever re-throw it. Which is exactly what happens in plpgsql because of the EXCEPTION clause in the above example. I don't know how many platforms block signals during handlers in this way, but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least OpenBSD acts likewise, so that's a pretty darn large chunk of the world. Isn't this why sigsetjmp/siglongjmp where invented? Is there a situation where you don't want the signal mask restored? BTW, seems on BSD systems sigsetjmp == setjmp: http://www.gnu.org/software/libc/manual/html_node/Non_002dLocal-Exits-and-Signals.html Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 11/27/13 01:21, Andres Freund wrote: On 2013-11-26 13:32:44 +0100, Andres Freund wrote: This seems to be the case since b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using scan_all to determine whether we can set new_frozen_xid. That's a slight pessimization when we scan a relation fully without explicitly scanning it in its entirety, but given this isn't the first bug around scanned_pages/rel_pages I'd rather go that way. The aforementioned commit wasn't primarily concerned with that. Alternatively we could just compute new_frozen_xid et al before the lazy_truncate_heap. I've gone for the latter in this preliminary patch. Not increasing relfrozenxid after an initial data load seems like a bit of a shame. I wonder if we should just do scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages? Hmm, you did (scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages) for relminmxid, and just (vacrelstats-scanned_pages vacrelstats-rel_pages) for relfrozenxid. That was probably not what you meant to do, the thing you did for relfrozenxid looks good to me. Does the attached look correct to you? - Heikki diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 6778c7d..6688ab3 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -178,7 +178,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, int usecs; double read_rate, write_rate; - bool scan_all; + bool scan_all; /* should we scan all pages? */ + bool scanned_all; /* did we actually scan all pages? */ TransactionId freezeTableLimit; BlockNumber new_rel_pages; double new_rel_tuples; @@ -226,6 +227,21 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, vac_close_indexes(nindexes, Irel, NoLock); /* + * Compute whether we actually scanned the whole relation. If we did, we + * can adjust relfrozenxid and relminmxid. + * + * NB: We need to check this before truncating the relation, because that + * will change -rel_pages. + */ + if (vacrelstats-scanned_pages vacrelstats-rel_pages) + { + Assert(!scan_all); + scanned_all = false; + } + else + scanned_all = true; + + /* * Optionally truncate the relation. * * Don't even think about it unless we have a shot at releasing a goodly @@ -254,8 +270,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, * is all-visible we'd definitely like to know that. But clamp the value * to be not more than what we're setting relpages to. * - * Also, don't change relfrozenxid if we skipped any pages, since then we - * don't know for certain that all tuples have a newer xmin. + * Also, don't change relfrozenxid/relminmxid if we skipped any pages, + * since then we don't know for certain that all tuples have a newer xmin. */ new_rel_pages = vacrelstats-rel_pages; new_rel_tuples = vacrelstats-new_rel_tuples; @@ -269,13 +285,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, if (new_rel_allvisible new_rel_pages) new_rel_allvisible = new_rel_pages; - new_frozen_xid = FreezeLimit; - if (vacrelstats-scanned_pages vacrelstats-rel_pages) - new_frozen_xid = InvalidTransactionId; - - new_min_multi = MultiXactCutoff; - if (vacrelstats-scanned_pages vacrelstats-rel_pages) - new_min_multi = InvalidMultiXactId; + new_frozen_xid = scanned_all ? FreezeLimit : InvalidTransactionId; + new_min_multi = scanned_all ? MultiXactCutoff : InvalidMultiXactId; vac_update_relstats(onerel, new_rel_pages, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Platform-dependent(?) failure in timeout handling
Hi, On 2013-11-26 18:50:28 -0500, Tom Lane wrote: I don't know how many platforms block signals during handlers in this way, but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least OpenBSD acts likewise, so that's a pretty darn large chunk of the world. Just as a datapoint, I think on linux (and likely on any system using sigaction()) that depends on how you setup the signal handler. There's a flag that controls that behaviour, namely SA_NODEFER. When using signal(2) the behaviour depends on the flavor of system we're running and even on some #defines. I am not really sure whether using SA_NODEFER would be a good idea, even if we could achieve that behaviour on all platforms. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Tue, Nov 26, 2013 at 8:19 PM, Peter Geoghegan p...@heroku.com wrote: There are some visibility-related race conditions even still I also see this, sandwiched between the very many deadlock detected errors recorded over 6 or so hours (this is in chronological order, with no ERRORs omitted within the range shown): ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected This, along with the already-discussed attempted to update invisible tuple forms a full account of unexpected ERRORs seen during the extended run of the test case, so far. Since it took me a relatively long time to recreate this, it may not be trivial to do so. Unless you don't think it's useful to do so, I'm going to give this test a full 24 hours, just in case it shows up anything else like this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Tue, Nov 26, 2013 at 03:25:44PM -0800, Kevin Grittner wrote: doc patch? Instead of the fix you mean, or with it? I don't see what we would change in the docs for the fix; the alternative might be to document that pg_dumpall output will fail to restore if any database (or the restoring user) has this property set. a) since pg_dump is not planned to be changed to deal with it a hint in the pg_dump docs would be helpful I can fully understand the argument that if the dump does NOT contain a create database the restore should indeed honor the existing database setting. In case it does contain a create database statement the issue won't exist -- because the database will need to be gone before the restore. b) if pg_ugprade is not fixed then a hint in the docs is useful (see below) pg_upgrade probably needs a doc patch too, or a reset like pg_dumpall. pg_upgrade is more like pg_dumpall, so a code patch seems most logical, again, assuming we decide that pg_dumpall is the right place for this reset of default_transaction_read_only. I don't have much opinion on what the pg_upgrade aspect except, The situation is this: naive user: - installs new PG version - upgrades old cluster with one default_read_only database - upgrade fails (or check fails - latter requires patch) - user asks someone - user unsets default_read_only - upgrades old cluster - upgrade succeeds - user re-sets default_read_only careful user: - installs new PG version - reads pg_upgrade docs of new version (requires doc patch) - user unsets default_read_only - upgrades old cluster - upgrade succeeds - user re-sets default_read_only I can't for the life of it think of a scenario where one would go: Oh, I've got a default_read_only database -- let's NOT upgrade the cluster. check. Passing the check but failing during the upgrade would not be very user-friendly. Again, I'm not sure that a doc patch is needed to say that pg_upgrade works even when this option is set. Why would anyone assume otherwise? Why would we list this property and not others? A doc patch is only needed if pg_upgrade does NOT work when some databases are default_read_only because THAT is counterintuitive: To the naive user upgrading from version to version is like make a copy, perhaps rename a file or two. It doesn't intuitively suggest a need for WRITE access to individual databases themselves. However, if the current behaviour is retained it would be very useful to document that and also document one or two ways around it (unsetting, PGOPTIONS, ...). Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote: On 11/27/13 01:21, Andres Freund wrote: On 2013-11-26 13:32:44 +0100, Andres Freund wrote: This seems to be the case since b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using scan_all to determine whether we can set new_frozen_xid. That's a slight pessimization when we scan a relation fully without explicitly scanning it in its entirety, but given this isn't the first bug around scanned_pages/rel_pages I'd rather go that way. The aforementioned commit wasn't primarily concerned with that. Alternatively we could just compute new_frozen_xid et al before the lazy_truncate_heap. I've gone for the latter in this preliminary patch. Not increasing relfrozenxid after an initial data load seems like a bit of a shame. I wonder if we should just do scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages? Hmm, you did (scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages) for relminmxid, and just (vacrelstats-scanned_pages vacrelstats-rel_pages) for relfrozenxid. That was probably not what you meant to do, the thing you did for relfrozenxid looks good to me. I said it's a preliminary patch ;), really, I wasn't sure what of both to go for. Does the attached look correct to you? Looks good. I wonder if we need to integrate any mitigating logic? Currently the corruption may only become apparent long after it occurred, that's pretty bad. And instructing people run a vacuum after the ugprade will cause the corrupted data being lost if they are already 2^31 xids. But integrating logic to fix things into heap_page_prune() looks somewhat ugly as well. Afaics the likelihood of the issue occuring on non-all-visible pages is pretty low, since they'd need to be skipped due to lock contention repeatedly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-11-27 01:09:49 -0800, Peter Geoghegan wrote: On Tue, Nov 26, 2013 at 8:19 PM, Peter Geoghegan p...@heroku.com wrote: There are some visibility-related race conditions even still I also see this, sandwiched between the very many deadlock detected errors recorded over 6 or so hours (this is in chronological order, with no ERRORs omitted within the range shown): ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: unable to fetch updated version of tuple ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected ERROR: deadlock detected This, along with the already-discussed attempted to update invisible tuple forms a full account of unexpected ERRORs seen during the extended run of the test case, so far. I think at least the unable to fetch updated version of tuple ERRORs are likely to be an unrelated 9.3+ BUG that I've recently reported. Alvaro has a patch. C.f. 20131124000203.ga4...@alap2.anarazel.de Even the deadlock detected errors might be a fkey-locking issue. Bug #8434, but that's really hard to know without more details. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
Hi, Rajeev I tested the latest patch. My observation is: If we give relative data directory path while registering the service, then service start fails. But same works if the data directory is absolute path. Looks like an existing issue. May be we need to internally convert relative data path to absolute. Since the mentioned issue is an existing issue and not because of this patch. So can we take that as separate defect and fix. If so, then I can move this patch to ready for committer. I think so too. In boot by Service, CurrentDirectory seems to be C:/Windows/system32. So, you have to set a relative data directory path that the starting point to be C:/Windows/system32. Thanks and Regards, Kumar Rajeev Rastogi Regards, Naoya --- Naoya Anzai Engineering Department NEC Soft, Ltd. E-Mail: anzai-na...@mxu.nes.nec.co.jp --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Wed, Nov 27, 2013 at 1:20 AM, Andres Freund and...@2ndquadrant.com wrote: Even the deadlock detected errors might be a fkey-locking issue. Bug #8434, but that's really hard to know without more details. Thanks, I was aware of that but didn't make the connection. I've written a test-case that is designed to exercise one case that deadlocks like crazy - deadlocking is the expected, correct behavior. The deadlock errors are not in themselves suspicious. Actually, if anything I find it suspicious that there aren't more deadlocks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Name type in postgres
hello. I want do understand that did can we have name without null-terminator? in src/backend/util/adt/name.c in nameout() function is: Names = PG_GETARG_NAME(0); PG_RETURN_CSTRING(pstrdup(NameStr(*s))); what do the pstrdup() function? do this function create string from name with null-terminate?
Re: [HACKERS] Assertions in PL/PgSQL
2013/11/27 Peter Eisentraut pete...@gmx.net On Tue, 2013-11-19 at 10:40 -0500, Robert Haas wrote: I think the goal was to get to RAISE ASSERT WHEN ...; then, if assertions are off, you do nothing; if they're on, you error. IF condition THEN RAISE... isn't a suitable surrogate in that case because you incur the overhead of testing the condition regardless. So if I do RAISE ASSERT WHEN condition and assertions are off, then condition wouldn't even be evaluated? But what about RAISE NOTICE WHEN, when log_min_messages is error? What about the side effects of the format string? This is all just getting too weird. I don't see anything wrong with considering a separate ASSERT command with its own semantics, like in many other programming languages. My objection against ASSERT command was one - it was too simply (against to cost of possible collision from introduction new (wide used) keyword. I can live with ASSERT statement - but I expect as minimum a possibility to specify level (failure, tracing, ...) and specify a message related to assert. Assert with only expression is not enough. Regards Pavel
Re: [HACKERS] Status of FDW pushdowns
Hi Merlin, 2013/11/22 Merlin Moncure mmonc...@gmail.com: On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: 2013/11/22 Tom Lane t...@sss.pgh.pa.us: Merlin Moncure mmonc...@gmail.com writes: On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote: I know join pushdowns seem insignificant, but it helps to restrict what data must be passed back because you would only pass back joined rows. By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. [ shrug... ] So use dblink. For better or worse, the FDW stuff is following the SQL standard's SQL/MED design, which does not do it like that. Pass-through mode mentioned in SQL/MED standard might be what he wants. happen to have a link handy? SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI wrote: * In get_relation_info(), the patch determines the branch condition keyattno != ObjectIdAttributeNumber. I fail to understand the meaning of this branch condition. Could you explain about it? Literally answering, it means oid cannot be NULL (if it exists). Understood. Thank you for the detailed information. But I'm not sure it's worth complicating the code. What use cases are you thinking? Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
On 27 November 2013, Naoya Anzai wrote: Hi, Rajeev I tested the latest patch. My observation is: If we give relative data directory path while registering the service, then service start fails. But same works if the data directory is absolute path. Looks like an existing issue. May be we need to internally convert relative data path to absolute. Since the mentioned issue is an existing issue and not because of this patch. So can we take that as separate defect and fix. If so, then I can move this patch to ready for committer. I think so too. In boot by Service, CurrentDirectory seems to be C:/Windows/system32. So, you have to set a relative data directory path that the starting point to be C:/Windows/system32. OK. Then I am moving it to ready for committer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On 11/15/2013 05:04 PM, Dimitri Fontaine wrote: Hi, Peter Eisentraut pete...@gmx.net writes: Rebased patch. No changes except that merge conflicts were resolved, and I had to add some Data::Dumper tweaks to the regression tests so that the results came out in consistent order on different versions of Perl. On the higher level design, the big question here is about selective behavior. As soon as you CREATE TRANSFORM FOR hstore LANGUAGE plperl then any plperl function will now receive its hstore arguments as a proper perl hash rather than a string. Any pre-existing plperl function with hstore arguments or return type then needs to be upgraded to handle the new types nicely, and some of those might not be under the direct control of the DBA running the CREATE TRANSFORM command, when using some plperl extensions for example. A mechanism allowing for the transform to only be used in some functions but not others might be useful. The simplest such mechanism I can think of is modeled against the PL/Java classpath facility as specified in the SQL standard: you attach a classpath per schema. If we start adding granularity, then why not go all the way? I mean, we could do it in the following way 1) create named transforms CREATE [DEFAULT] TRANSFORM xformname FOR type LANGUAGE lang (...details...); 2) use it when declaring a function CREATE function funcname( IN argname type [[USING] [TRANSFORM] xformname], INOUT argname type [[USING] [IN] [TRANSFORM] xformname] [[USING] [OUT] [TRANSFORM] xformname], OUT argname type [[USING] [TRANSFORM] xformname], ... ) LANGUAGE lang $$ funcdef $$; This approach allows full flexibility in using old packages, especially if we define old transform behaviour as DEFAULT TRANSFORM Default transforms also allow easy way for rewriting current type i/o conversions between languages into transforms. There are immediately a few transforms that I would find useful A) pass field data to language as pairs of (typeoid, typebin) this is useful for speed, especially if you do not want to use many of the passed arguments on most invocations B) pass field data in as (typeoid, typebin), except do not de-toast values but pass in the toast ids, so the function is free to use only parts of toasted values as it needs C) pass field data in as string, probably the default behaviour for languages like pl/tcl and pl/sh D) and then of course just having a sensible transforms for extension types like the current patch provides. Worst case, that I really don't think we need, would be addressing that per-argument: CREATE FUNCTION foo (hash hstore WITH TRANSFORM, kv hstore) … I certainly hope we don't need that, and sure can't imagine use cases for that level of complexity at the time of writing this review. A typical use case would be to have a short hstore always passed in as dictionary and have another possibly large hstore passed in as toast pointer. And if we want to have all type conversions between postgres and pls re-written as transforms, then we do need named transforms, not just one per (pl, type) pair. Also, if we allow flexibility, the it is probably a good idea to implement full flexibility first and then look at making usage easy after that, instead of adding flexibility in small steps. Once we have per-argument transforms in place, we can look at setting per-schema defaults for ease of use. As large part of this is actually abstracting i/o conversions out of pl function code, I think we should look at allowing the conversion functions to be written in the target pl language in addition to C. I'll see if I can resurrect my patch for support of cstring and internal types in pl/python function defs for this. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 11/27/13 11:15, Andres Freund wrote: On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote: On 11/27/13 01:21, Andres Freund wrote: On 2013-11-26 13:32:44 +0100, Andres Freund wrote: This seems to be the case since b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using scan_all to determine whether we can set new_frozen_xid. That's a slight pessimization when we scan a relation fully without explicitly scanning it in its entirety, but given this isn't the first bug around scanned_pages/rel_pages I'd rather go that way. The aforementioned commit wasn't primarily concerned with that. Alternatively we could just compute new_frozen_xid et al before the lazy_truncate_heap. I've gone for the latter in this preliminary patch. Not increasing relfrozenxid after an initial data load seems like a bit of a shame. I wonder if we should just do scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages? Hmm, you did (scan_all || vacrelstats-scanned_pages vacrelstats-rel_pages) for relminmxid, and just (vacrelstats-scanned_pages vacrelstats-rel_pages) for relfrozenxid. That was probably not what you meant to do, the thing you did for relfrozenxid looks good to me. I said it's a preliminary patch ;), really, I wasn't sure what of both to go for. Does the attached look correct to you? Looks good. Ok, committed and backpatched that. I wonder if we need to integrate any mitigating logic? Currently the corruption may only become apparent long after it occurred, that's pretty bad. And instructing people run a vacuum after the ugprade will cause the corrupted data being lost if they are already 2^31 xids. Ugh :-(. Running vacuum after the upgrade is the right thing to do to prevent further damage, but you're right that it will cause any already-wrapped around data to be lost forever. Nasty. But integrating logic to fix things into heap_page_prune() looks somewhat ugly as well. I think any mitigating logic we might add should go into vacuum. It should be possible for a DBA to run a command, and after it's finished, be confident that you're safe. That means vacuum. Afaics the likelihood of the issue occuring on non-all-visible pages is pretty low, since they'd need to be skipped due to lock contention repeatedly. Hmm. If a page has its visibility-map flag set, but contains a tuple that appears to be dead because you've wrapped around, vacuum will give a warning: page containing dead tuples is marked as all-visible in relation \%s\ page %u. So I think if a manual VACUUM FREEZE passes without giving that warning, that vacuum hasn't destroyed any data. So we could advise to take a physical backup of the data directory, and run a manual VACUUM FREEZE on all databases. If it doesn't give a warning, you're safe from that point onwards. If it does, you'll want to recover from an older backup, or try to manually salvage just the lost rows from the backup, and re-index. Ugly, but hopefully rare in practice. Unfortunately that doesn't mean that you haven't already lost some data. Wrap-around could've already happened, and vacuum might already have run and removed some rows. You'll want to examine your logs and grep for that warning. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: domain over array support
On Tue, Nov 26, 2013 at 07:12:00PM -0200, Rodolfo Campero wrote: 2013/11/26 Heikki Linnakangas hlinnakan...@vmware.com Oops, sorry about that. Fixed. Maybe be you forgot to modify plpython_types_3.out Yes. Heikki, please fix plpython_types_3.out too. See attached patch. -- marko diff --git a/src/pl/plpython/expected/plpython_types_3.out b/src/pl/plpython/expected/plpython_types_3.out index 25331f2..e104356 100644 --- a/src/pl/plpython/expected/plpython_types_3.out +++ b/src/pl/plpython/expected/plpython_types_3.out @@ -664,6 +664,9 @@ SELECT * FROM test_type_conversion_array_error(); ERROR: return value of function with array return type is not a Python sequence CONTEXT: while creating return value PL/Python function test_type_conversion_array_error +-- +-- Domains over arrays +-- CREATE DOMAIN ordered_pair_domain AS integer[] CHECK (array_length(VALUE,1)=2 AND VALUE[1] VALUE[2]); CREATE FUNCTION test_type_conversion_array_domain(x ordered_pair_domain) RETURNS ordered_pair_domain AS $$ plpy.info(x, type(x)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote: Ok, committed and backpatched that. Thanks. I wonder if we need to integrate any mitigating logic? Currently the corruption may only become apparent long after it occurred, that's pretty bad. And instructing people run a vacuum after the ugprade will cause the corrupted data being lost if they are already 2^31 xids. Ugh :-(. Running vacuum after the upgrade is the right thing to do to prevent further damage, but you're right that it will cause any already-wrapped around data to be lost forever. Nasty. But integrating logic to fix things into heap_page_prune() looks somewhat ugly as well. I think any mitigating logic we might add should go into vacuum. It should be possible for a DBA to run a command, and after it's finished, be confident that you're safe. That means vacuum. Well, heap_page_prune() is the first thing that's executed by lazy_scan_heap(), that's why I was talking about it. So anything we do need to happen in there or before. Afaics the likelihood of the issue occuring on non-all-visible pages is pretty low, since they'd need to be skipped due to lock contention repeatedly. Hmm. If a page has its visibility-map flag set, but contains a tuple that appears to be dead because you've wrapped around, vacuum will give a warning: page containing dead tuples is marked as all-visible in relation \%s\ page %u. I don't think this warning is likely to be hit as the code stands - heap_page_prune() et. al. will have removed all dead tuples already, right and so has_dead_tuples won't be set. Independent from this, ISTM we should add a else if (PageIsAllVisible(page) all_visible) to those checks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: domain over array support
On 11/27/13 14:15, Marko Kreen wrote: On Tue, Nov 26, 2013 at 07:12:00PM -0200, Rodolfo Campero wrote: 2013/11/26 Heikki Linnakangas hlinnakan...@vmware.com Oops, sorry about that. Fixed. Maybe be you forgot to modify plpython_types_3.out Yes. Heikki, please fix plpython_types_3.out too. See attached patch. Ah, sorry. Committed.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
I'll prepare patch 2013/11/27 Tom Lane t...@sss.pgh.pa.us Dean Rasheed dean.a.rash...@gmail.com writes: Actually the IF EXISTS in DROP TABLE now applies to the schema as well. Unfortunately there is currently no consistency across the various DROP commands --- some tolerate a non-existent schema, while others error out. Yeah. I think now that we've had this discussion, we should make them all tolerate a non-existent schema. I'm fine with having that happen over a series of patches rather than all at once though. Also amongst those that tolerate a non-existent schema, the resulting notices are not consistent --- some report the schema-qualified object name, while others just report the local object name. Less excited about this part, but on the whole I'd vote for the schema no_such_schema does not exist wording in cases where the schema isn't there. The other way is less specific for no very good reason. regards, tom lane
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 11/27/13 14:11, Andres Freund wrote: On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote: Afaics the likelihood of the issue occuring on non-all-visible pages is pretty low, since they'd need to be skipped due to lock contention repeatedly. Hmm. If a page has its visibility-map flag set, but contains a tuple that appears to be dead because you've wrapped around, vacuum will give a warning: page containing dead tuples is marked as all-visible in relation \%s\ page %u. I don't think this warning is likely to be hit as the code stands - heap_page_prune() et. al. will have removed all dead tuples already, right and so has_dead_tuples won't be set. It might be a good idea to add such a warning to heap_page_prune(). Or also emit the warning in lazy_scan_heap() if heap_page_prune() returned 0. Independent from this, ISTM we should add a else if (PageIsAllVisible(page) all_visible) to those checks. Can you elaborate, where should that be added? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 2013-11-27 14:45:25 +0200, Heikki Linnakangas wrote: On 11/27/13 14:11, Andres Freund wrote: I don't think this warning is likely to be hit as the code stands - heap_page_prune() et. al. will have removed all dead tuples already, right and so has_dead_tuples won't be set. It might be a good idea to add such a warning to heap_page_prune(). Or also emit the warning in lazy_scan_heap() if heap_page_prune() returned 0. Independent from this, ISTM we should add a else if (PageIsAllVisible(page) all_visible) to those checks. Can you elaborate, where should that be added? I was thinking of adding such a warning below elog(WARNING, page containing dead tuples is marked as all-visible in relation \%s\ page %u,..) but cannot warn against that because GetOldestXmin() can go backwards... I think it's probably sufficient to set has_dead_tuples = true in the ItemIdIsDead() branch in lazy_scan_heap(). That should catch relevant actions from heap_page_prune(). Besides not warning in against deletions from heap_page_prune(), the current warning logic is also buggy for tables without indexes... /* * If there are no indexes then we can vacuum the page right now * instead of doing a second scan. */ if (nindexes == 0 vacrelstats-num_dead_tuples 0) { /* Remove tuples from heap */ lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, vmbuffer); has_dead_tuples = false; That happens before the else if (PageIsAllVisible(page) has_dead_tuples) check. With regard to fixing things up, ISTM the best bet is heap_prune_chain() so far. That's executed b vacuum and by opportunistic pruning and we know we have the appropriate locks there. Looks relatively easy to fix up things there. Not sure if there are any possible routes to WAL log this but using log_newpage()? I am really not sure what the best course of action is :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Shigeru Hanada escribió: SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Traffic jams in fn_extra
On Sun, Nov 24, 2013 at 4:21 PM, Simon Riggs si...@2ndquadrant.com wrote: Why do you need to do this dance with fn_extra? It's possible to allocate a hash table in a Transaction-lifetime memory context on first call into a function then cache things there. fn_extra gives a handle per function call site. It sounds to me like the complexity is coming not from having many Postgres functions but from having lots of infrastructure backing up those functions. So if many of their Postgres functions call a C function to do various things and the C function wants to cache something somewhere related to the object they've been passed then the natural thing to do is have the Postgres function pass fn_extra down to the C function but if they have many such C functions it gets a bit tricky. But you could declare fn_extra to be a hash table yourself since it's your Postgres function that gets to decide how fn_extra is going to be used. You could then pass that hash table down to the various C functions to cache state. However that might still be a bit odd. If you call the same C function twice from the same Postgres function it'll get the same hash table for both calls. fn_extra is per Postgres function call site. -- greg
Re: [HACKERS] Handling GIN incomplete splits
On 11/22/13 15:04, Michael Paquier wrote: On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's a new version. To ease the review, I split the remaining patch again into two, where the first patch is just yet more refactoring. I also fixed some bugs in WAL logging and replay that I bumped into while testing. Cool. Here is the review of the two remaining patches. 1) More refactoring, general remarks: - Code compiles without warnings - Passes make check - If I got it correctly... this patch separates the part managing data to be inserted from ginbtree. I can understand the meaning behind that if we consider that GinBtree is used only to define methods and search conditions (flags and keys). However I am wondering if this does not make the code more complicated... Well, I think it's an improvement in readability to separate the insertion payload from the search information. With the second patch it becomes more important, because if an incompletely split page is encountered while you're inserting a value, you needs to insert the downlink for the old incomplete split first, and then continue the insertion of the original vaule where you left. That context switching becomes a lot easier when value you're inserting is passed around separately. Particularly the flag isDelete that is only passed to ginxlogInsert meritates its comment IMO. Note that I haven't read the 2nd patch when writing that :) Yeah, that gets removed in the second patch, which changes the WAL record format. :-) 1-2) Could it be possible to change the variable name of GinBtreeEntryInsertData *entry in entryIsEnoughSpace? entry-entry is kind of hard to apprehend... Renaming it to either insertEntry. Another idea would be to rename entry in GinBtreeEntryInsertData to entryData or entryTuple. ok, renamed the variable to insertData. 1-3) This block of code is present two times: + if (!isleaf) + { + PostingItem *pitem = GinDataPageGetPostingItem(lpage, off); + PostingItemSetBlockNumber(pitem, updateblkno); + } Should the update of a downlink to point to the next page be a separate function? Doesn't seem any better to me. Thanks, committed this refactoring patch now. Will continue on the main patch. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Alvaro Herrera alvhe...@2ndquadrant.com writes: Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. I guess what's needed here is a kind of barrier that allows pushing a whole arbitrary subquery (with joins and quals and whatnot) down to the remote side. My current thinking about how to solve that would be to add a notion of FOREIGN VIEW in our system, which would basically implement that barrier and send the view definition on the remote, with known quals values as constants, or something like that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
Bruce Momjian br...@momjian.us wrote: On Tue, Nov 26, 2013 at 03:25:44PM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: How are we handling breakage of pg_dump, not pg_dumpall? That was discussed. Do you have something to add? I am confused what we are patching. Are we patching pg_dump, pg_dumpall, or both? Just pg_dumpall.c. Can I see the full patch? It was attached to this post: http://www.postgresql.org/message-id/1385225082.8248.yahoomail...@web162901.mail.bf1.yahoo.com Are we propagating other settings from pg_dump to pg_dumpall, like statement_timeout? pg_dumpall output sets up the global objects (including their properties) and then does a \connect to each database, followed by the same output that pg_dump would generate for that database. That includes the SET statements for statement_timeout, etc. The patch does nothing to change what objects or properties the pg_dumpall output tries to set up, it just sets a property *on the current connection* to allow those statements to run without error. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Nov 27, 2013 at 12:56 AM, Amit Kapila amit.kapil...@gmail.com wrote: - There is a comment TODO: It would be nice to behave like the history and the source strings were concatenated, so that you could compress using the new data, too. If we're not already doing that, then how are we managing to compress WAL by more than one-quarter in the hundred tiny fields, all changed case? Algorithm is not doing concatenation of history and source strings, the hash table is formed just with history data and then later if match is not found then it is added to history, so this can be the reason for the above result. From the compressor's point of view, that's pretty much equivalent to behaving as if those strings were concatenated. The point is that there's a difference between using the old tuple's history entries to compress the new tuple and using the new tuple's own history to compress it. The former is delta-compression, which is what we're supposedly doing here. The later is just plain compression. That doesn't *necessarily* make it a bad idea, but they are clearly two different things. The basic idea is that you use a rolling hash function to divide up the history data into chunks of a given average size. So we scan the history data, compute a rolling hash value at each point, and each time the bottom N bits are zero, we consider that to be the end of a chunk. We enter all the chunks into a hash table. The chunk size will vary, but on the average, given a reasonably well-behaved rolling hash function (the pglz one probably doesn't qualify) it'll happen every 2^N bytes, so perhaps for this purpose we'd choose N to be between 3 and 5. Then, we scan the input we want to compress and divide it into chunks in the same way. Chunks that don't exist in the history data get copied to the output, while those that do get replaced with a reference to their position in the history data. I think this idea looks better than current and it will definately improve some of the cases, but not sure we can win in all cases. We have tried one of the similar idea (reduce size of hash and eventually comparision) by adding every 10 bytes to the history lookup table rather than every byte. It improved most cases but not all cases (hundred tiny fields, all changed, hundred tiny fields, half changed test were still slow). Patch and results are at link (refer approach-1): http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kap...@huawei.com What you did there will, I think, tend to miss a lot of compression opportunities. Suppose for example that the old tuple is ABCDEFGHIJKLMNOP and the new tuple is xABCDEFGHIJKLMNOP. After copying one literal byte we'll proceed to copy 9 more, missing the fact that there was a long match available after the first byte. The advantage of the fingerprinting technique is that it's supposed to be resistant to that sort of thing. Now the tough question is what are the possible options for this patch and which one to pick: a. optimize encoding technique, so that it can improve results in most cases even if not all. b. have a table level option or guc to enable/disable WAL compression. c. use some heuristics to check if chances of compression are good, then only perform compression. 1. apply this optimization for tuple size 128 and 2000 2. apply this optimization if number of modified columns are less than 25% (some threshold number) of total columns. I think we can get modified columns from target entry and use it if triggers haven't changed that tuple. I remember earlier there were concerns that this value can't be trusted completely, but I think using it as a heuristic is not a problem, even if this number is not right in some cases. d. forget about this optimization and reject the patch. I think by doing option 'b' and 'c' together can make this optimization usable in cases where it is actually useful. I agree that we probably want to do (b), and I suspect we want both a GUC and a reloption, assuming that can be done relatively cleanly. However, I think we should explore (a) more before we explore (c). I think there's a good chance that we can reduce the CPU overhead of this enough to feel comfortable having it enabled by default. If we proceed with heuristics as in approach (c), I don't think that's the end of the world, but I think there will be more corner cases where we lose and have to fiddle things manually. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shave a few instructions from child-process startup sequence
On Tue, Nov 26, 2013 at 10:12 PM, Gurjeet Singh gurj...@singh.im wrote: On Tue, Nov 26, 2013 at 9:42 PM, Peter Eisentraut pete...@gmx.net wrote: src/backend/postmaster/postmaster.c:2255: indent with spaces. +else src/backend/postmaster/postmaster.c:2267: indent with spaces. +break Not sure how that happened! Attached is the updated patch. This is a performance patch, so it should come with benchmark results demonstrating that it accomplishes its intended purpose. I don't see any. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
2013/11/26 Pavel Stehule pavel.steh...@gmail.com 2013/11/26 Peter Eisentraut pete...@gmx.net On 11/22/13, 3:26 AM, Pavel Stehule wrote: website is related to patch for 9.3 (I add note there) patch for 9.4 is fixed - and now with small doc I think it would help if we considered the new border styles and the new line styles separately. I don't find the new border styles to be particularly useful. They just use up vertical screen space, which is usually more precious than vertical space. Is there a situation where you would find these styles to be more useful than the existing ones? Keep in mind that pset is usually set permanently, so it wouldn't be practical to use a different border style depending on how the query results shape up (like \x auto). Now for the linestyles. I can see how some of them are attractive, but several of them have poor aesthetics, I think. I don't see a reason to accept 7 new styles just for fun. If I had to choose, I'd consider -double1 and -double4 to be acceptable. I am sending reduced patch I add a double1, double4 and double5 - renamed to double1,double2, double3 support for border 3 and 4 removed postgres=# \pset linestyle double1 \pset border 2 \l Line style (linestyle) is double1. Border style (border) is 2. List of databases ┌───┬──┬──┬─┬─┬───┐ │ Name│ Owner │ Encoding │ Collate │Ctype│ Access privileges │ ╞═══╪══╪══╪═╪═╪═══╡ │ postgres │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ │ │ template0 │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres │ │ │ │ │ │ │ postgres=CTc/postgres │ │ template1 │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres │ │ │ │ │ │ │ postgres=CTc/postgres │ └───┴──┴──┴─┴─┴───┘ (3 rows) postgres=# \pset linestyle double2 \pset border 2 \l Line style (linestyle) is double2. Border style (border) is 2. List of databases ╔═══╤══╤══╤═╤═╤═══╗ ║ Name│ Owner │ Encoding │ Collate │Ctype│ Access privileges ║ ╟───┼──┼──┼─┼─┼───╢ ║ postgres │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ ║ ║ template0 │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres ║ ║ │ │ │ │ │ postgres=CTc/postgres ║ ║ template1 │ postgres │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres ║ ║ │ │ │ │ │ postgres=CTc/postgres ║ ╚═══╧══╧══╧═╧═╧═══╝ (3 rows) postgres=# \pset linestyle double3 \pset border 2 \l Line style (linestyle) is double3. Border style (border) is 2. List of databases ╔═══╦══╦══╦═╦═╦═══╗ ║ Name║ Owner ║ Encoding ║ Collate ║Ctype║ Access privileges ║ ╠═══╬══╬══╬═╬═╬═══╣ ║ postgres ║ postgres ║ UTF8 ║ en_US.UTF-8 ║ en_US.UTF-8 ║ ║ ║ template0 ║ postgres ║ UTF8 ║ en_US.UTF-8 ║ en_US.UTF-8 ║ =c/postgres ║ ║ ║ ║ ║ ║ ║ postgres=CTc/postgres ║ ║ template1 ║ postgres ║ UTF8 ║ en_US.UTF-8 ║ en_US.UTF-8 ║ =c/postgres ║ ║ ║ ║ ║ ║ ║ postgres=CTc/postgres ║ ╚═══╩══╩══╩═╩═╩═══╝ (3 rows) Pavel commit 47b3258db6e3e82399417f57ad551ad740348635 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Wed Nov 27 15:09:01 2013 +0100 initial diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 264cfe6..6e74690 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2088,9 +2088,10 @@ lo_import 152801 para Sets the border line drawing style to one of literalascii/literal, literalold-ascii/literal - or literalunicode/literal. - Unique abbreviations are allowed. (That would mean one - letter is enough.) + or literalunicode/literal, literaldouble1/literal, + literaldouble2/literal or literaldouble3/literal + (unicode styles). Unique abbreviations are allowed. + (That would mean one letter is enough.) The
Re: [HACKERS] Extension Templates S03E11
On 11/26/2013 10:07 PM, Dimitri Fontaine wrote: What I now think we should do is only grant superusers the privileges to install an extension from a template they own or is owned by another superuser. Say what? Superusers bypass all privileges by definition. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
Andres Freund and...@2ndquadrant.com wrote: On 2013-11-26 14:14:38 -0800, Kevin Grittner wrote: I happened to build in a shell that was still set up for the clang address sanitizer, and got the attached report. On a rerun it was repeatable. XLogInsert() seems to read past the end of a variable allocated on the stack in doPickSplit(). I haven't tried to analyze it past that, since this part of the code is unfamiliar to me. Yea, I've seen that one before as well and planned to report it at some point. The reason is the MAXALIGN()s in ACCEPT_RDATA_DATA(). That rounds up to 8byte boundaries, while we've e.g. only added 2bytes of slop to toDelete. Have you established whether having the CRC calculation read past the end of the buffer can cause problems on recovery or standby systems? Should we try to get this fixed by Monday? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Platform-dependent(?) failure in timeout handling
Tom Lane t...@sss.pgh.pa.us wrote: 3. Establish a coding rule that if you catch an error with PG_TRY() and don't re-throw it, you have to unblock signals in your PG_CATCH block. Could that be done in the PG_END_TRY macro? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Nov 27, 2013 at 7:35 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Nov 27, 2013 at 12:56 AM, Amit Kapila amit.kapil...@gmail.com wrote: The basic idea is that you use a rolling hash function to divide up the history data into chunks of a given average size. So we scan the history data, compute a rolling hash value at each point, and each time the bottom N bits are zero, we consider that to be the end of a chunk. We enter all the chunks into a hash table. The chunk size will vary, but on the average, given a reasonably well-behaved rolling hash function (the pglz one probably doesn't qualify) it'll happen every 2^N bytes, so perhaps for this purpose we'd choose N to be between 3 and 5. Then, we scan the input we want to compress and divide it into chunks in the same way. Chunks that don't exist in the history data get copied to the output, while those that do get replaced with a reference to their position in the history data. I think this idea looks better than current and it will definately improve some of the cases, but not sure we can win in all cases. We have tried one of the similar idea (reduce size of hash and eventually comparision) by adding every 10 bytes to the history lookup table rather than every byte. It improved most cases but not all cases (hundred tiny fields, all changed, hundred tiny fields, half changed test were still slow). Patch and results are at link (refer approach-1): http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kap...@huawei.com What you did there will, I think, tend to miss a lot of compression opportunities. Suppose for example that the old tuple is ABCDEFGHIJKLMNOP and the new tuple is xABCDEFGHIJKLMNOP. After copying one literal byte we'll proceed to copy 9 more, missing the fact that there was a long match available after the first byte. That is right, but one idea to try that out was to see if we can reduce CPU usage at cost of compression, but we found that it didn't completely eliminate that problem. The advantage of the fingerprinting technique is that it's supposed to be resistant to that sort of thing. Okay, one question arise here is that can it be better in terms of CPU usage as compare to when we have used hash function for every 10th byte, if you have a feeling that it can improve situation, I can try a prototype implementation of same to check the results. Now the tough question is what are the possible options for this patch and which one to pick: a. optimize encoding technique, so that it can improve results in most cases even if not all. b. have a table level option or guc to enable/disable WAL compression. c. use some heuristics to check if chances of compression are good, then only perform compression. 1. apply this optimization for tuple size 128 and 2000 2. apply this optimization if number of modified columns are less than 25% (some threshold number) of total columns. I think we can get modified columns from target entry and use it if triggers haven't changed that tuple. I remember earlier there were concerns that this value can't be trusted completely, but I think using it as a heuristic is not a problem, even if this number is not right in some cases. d. forget about this optimization and reject the patch. I think by doing option 'b' and 'c' together can make this optimization usable in cases where it is actually useful. I agree that we probably want to do (b), and I suspect we want both a GUC and a reloption, assuming that can be done relatively cleanly. However, I think we should explore (a) more before we explore (c). Sure, but to explore (a), the scope is bit bigger. We have below options to explore (a): 1. try to optimize existing algorithm as used in patch, which we have tried but ofcourse we can spend some more time to see if anything more can be tried out. 2. try fingerprint technique as suggested by you above. 3. try some other standard methods like vcdiff, lz4 etc. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [HITB-Announce] #HITB2014AMS Call for Papers Now Open
Hi everyone - The Call for Papers for the 5th annual HITB Security Conference in Amsterdam is now open. #HITB2014AMS takes place at the Beurs van Berlage from the 27th - 30th of May 2014. The official conference hotel for the event is the Hilton DoubleTree. As always we start with 2-days of hands on technical trainings followed by a 2-day triple track conference. However, 2014 also sees the introduction of a brand new addition to the HITB line up - the HITB Haxpo or hacker expo! With support and backing from iAmsterdam, HITB Haxpo is going to be unlike anything that's been done before. Think IT security exhibition meets Makerfaire with a generous touch of HITBSecConf flavor thrown in for good measure. What exactly does that mean? Imagine an area dedicated to hackerspaces; makers with 3D printers, laser cutters and other fabrication goodies coupled with TOOOL's Lock Picking Village, HITB and Mozilla's HackWEEKDAY developer hackathon, a hackerspaces challenge featuring LEGO Mindstorms EV3, our Capture the Flag 'live hacking' competition and topped off by a 3 day IT exhibition featuring Microsoft and Google as the main anchors. The best part? Entrance to the 3-day Haxpo is COMPLETELY FREE! As 2014 marks our 5th year anniversary of HITB in EU, #HITB2014AMS will be a very special edition conference with an ALL WOMEN keynote line up: Opening Keynote (29th May 2014): Katie Moussouris (Lead, Microsoft Security Response Center) Kristin Lovejoy (GM, IBM Security Services Division) Closing Keynote (29th May 2014): === Mischel Kwon (Former Director, US Computer Emergency Response Team) Opening Keynote (30th May 2014): Pamela Fusco (Chief Information Security Officer, Apollo Group) Jennifer Steffens (Chief Executive Officer, IOActive) Closing Keynote (30th May 2014): Jaya Baloo (Chief Information Security Officer, KPN Telecom) --- We're looking for talks that are highly technical, but most importantly, material which is new, fresh and hasn't been presented previously. Submission Deadlines: Round #1 selection: 1st week January 2014 Round #2 selection: 1st Week of February 2014 CFP submissions will be evaluated in 2 rounds. If all slots are filled in the first selection round, we will close CFP early so DON'T DELAY SUBMITTING HITB CFP: http://cfp.hackinthebox.org/ Event Website: http://haxpo.nl || http://conference.hitb.org (Opens 1st Dec) === Each accepted submission will entitle the speaker(s) to accommodation for 3 nights / 4 days and travel expense reimbursement up to EUR1200.00 per speaking slot. Topics of interest include, but are not limited to the following: Cloud Security File System Security 3G/4G/WIMAX Security SS7/GSM/VoIP Security Security of Medical Devices Critical Infrastructure Security Smartphone / MobileSecurity Smart Card and Physical Security Network Protocols, Analysis and Attacks Applications of Cryptographic Techniques Side Channel Analysis of Hardware Devices Analysis of Malicious Code / Viruses / Malware Data Recovery, Forensics and Incident Response Hardware based attacks and reverse engineering Windows / Linux / OS X / *NIX Security Vulnerabilities Next Generation Exploit and Exploit Mitigation Techniques NFC, WLAN, GPS, HAM Radio, Satellite, RFID and Bluetooth Security WHITE PAPER: If your presentation is short listed for inclusion into the conference program, a technical white paper must also be provided for review (3000 - 5000 words). Your submissions will be reviewed by The HITB CFP Review Committee: Charlie Miller (formerly Principal Research Consultant, Accuvant Labs) Katie Moussouris, Senior Security Strategist, Microsoft Itzik Kotler, Chief Technology Officer, Security Art Cesar Cerrudo, Chief Technology Officer, IOActive Jeremiah Grossman, Founder, Whitehat Security Andrew Cushman, Senior Director, Microsoft Saumil Shah, Founder CEO Net-Square Thanh 'RD' Nguyen, THC, VNSECURITY Alexander Kornburst, Red Database Fredric Raynal, QuarksLab Shreeraj Shah, Founder, BlueInfy Emmanuel Gadaix, Founder, TSTF Andrea Barisani, Inverse Path Philippe Langlois, TSTF Ed Skoudis, InGuardians Haroon Meer, Thinkst Chris Evans, Google Raoul Chiesa, TSTF/ISECOM rsnake, SecTheory Gal Diskin, Intel Skyper, THC Note: We do not accept product or vendor related pitches. If you would like to showcase your company's products or technology at HITB Haxpo (which also has it's own set of speaking slots), please email i...@haxpo.nl or conferencei...@hackinthebox.org Regards, Hafez Kamal Hack in The Box (M) Sdn. Bhd 36th Floor, Menara Maxis Kuala Lumpur City Centre 50088 Kuala Lumpur, Malaysia Tel: +603-26157299 Fax: +603-26150088 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Name type in postgres
mohsen soodkhah mohammadi mohsensoodk...@gmail.com writes: I want do understand that did can we have name without null-terminator? No. Possibly that was the original intention, but for many years the system has enforced that Name contains at least one trailing null byte, ie the maximum usable length of identifiers is NAMEDATALEN-1. So the code you quote is valid. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Nov 27, 2013 at 4:20 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: Hi Merlin, 2013/11/22 Merlin Moncure mmonc...@gmail.com: On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: 2013/11/22 Tom Lane t...@sss.pgh.pa.us: Merlin Moncure mmonc...@gmail.com writes: On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote: I know join pushdowns seem insignificant, but it helps to restrict what data must be passed back because you would only pass back joined rows. By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. [ shrug... ] So use dblink. For better or worse, the FDW stuff is following the SQL standard's SQL/MED design, which does not do it like that. Pass-through mode mentioned in SQL/MED standard might be what he wants. happen to have a link handy? SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. yeah. (thanks for indulging -- this is barely on topic I guess). if it were possible to create a supporting function (say, fdw_link) that could somehow interface with a previously established server, it could probably be worked out. Then all FDW could leverage parameterization without having to copy and paste the pgsql-fdw qual push code. But that would be a fairly large break from the rest of the FDW syntax and having to define the record at each call site is admittedly a bit of a headache. Hm, another way to think about this would be to somehow abstract the qual push into a library so that it could be accessed by other FDWs if they opted in. This would address my chief complaint that only the pgsql-fdw (the only database for which we already have an in-core high quality connection api) driver could tap the excellent work you've done. If this were even possible, it would probably result in more fdw API changes. If my: SELECT * FROM big_sql_server_foreign_table WHERE id = x; was fast, that'd be pretty nice. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
2013/11/27 Dimitri Fontaine dimi...@2ndquadrant.fr: Alvaro Herrera alvhe...@2ndquadrant.com writes: Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. I guess what's needed here is a kind of barrier that allows pushing a whole arbitrary subquery (with joins and quals and whatnot) down to the remote side. Yes, a big problem is how to skip parsing remote query in PG context. Bare query string (other than string literal) always parsed by PG parser, but remote side would have different syntax and semantics, as Dimitri says we need to pass whole of arbitrary query string to remote side as-is. My current thinking about how to solve that would be to add a notion of FOREIGN VIEW in our system, which would basically implement that barrier and send the view definition on the remote, with known quals values as constants, or something like that. I'm sorry but I don't see the point here. Do you mean that user executes CREATE FOREIGN VIEW in advance and uses the view in a subsequent query? Or, allow new syntax like WITH alias AS FOREIGN VIEW (remote query)? I think it's nice to support executing ad-hoc remote query written in the syntax which is valid only on remote data source through FDW, and at the moment dblink interface seems feasible for that purpose. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
2013-11-23 22:01 keltezéssel, Tom Lane írta: Boszormenyi Zoltan z...@cybertec.at writes: Attached is the patch that modified the command tag returned by the DECLARE CURSOR command. It returns DECLARE SCROLL CURSOR or DECLARE NO SCROLL CURSOR depending on the cursor's scrollable flag that can be determined internally even if neither is asked explicitly. This does not strike me as an acceptable change. It will break any code that's expecting the existing command tag, for little or no benefit to most applications. Even if backwards compatibility were of no concern, I'm not convinced it's a good thing to expose the backend's internal choices about query plans used for cursors, which is what this is basically doing. I saw code in the backend allowing a cursor to be scrollable, although it was not declared as such. How about ripping that out? That way there would be no incentive for lazy SQL coding using simple cursors. You can argue that it would also break application compatibility but on the other hand, such a code has always been buggy and should be fixed. It is expected by the ECPG cursor readahead code. And that doesn't sound like a sufficient excuse. You should only assume a cursor is scrollable if SCROLL was specified in the cursor declaration command, which it'd seem to me is something ECPG would or easily could know about commands it issues. Yes, it can and I have a patch in the series passing this info to ecpglib. I am also arguing for backward compatibility on a different angle: this small backend change would still allow using simple cursors in ECPG while using the cursor readahead. And it's not the first time drivers have to adapt to new PostgreSQL major versions. If it was, I wouldn't have the courage to set a precedent either. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
On Wed, Nov 27, 2013 at 06:23:38AM -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2013-11-26 14:14:38 -0800, Kevin Grittner wrote: I happened to build in a shell that was still set up for the clang address sanitizer, and got the attached report. On a rerun it was (Kevin, I saw no attachment.) repeatable. XLogInsert() seems to read past the end of a variable allocated on the stack in doPickSplit(). I haven't tried to analyze it past that, since this part of the code is unfamiliar to me. Yea, I've seen that one before as well and planned to report it at some point. The reason is the MAXALIGN()s in ACCEPT_RDATA_DATA(). That rounds up to 8byte boundaries, while we've e.g. only added 2bytes of slop to toDelete. Have you established whether having the CRC calculation read past the end of the buffer can cause problems on recovery or standby systems? Should we try to get this fixed by Monday? The threat is that rounding the read size up to the next MAXALIGN would cross into an unreadable memory page, resulting in a SIGSEGV. Every palloc chunk has MAXALIGN'd size under the hood, so the excess read of toDelete cannot cause a SIGSEGV. For a stack variable, it depends on the ABI. I'm not aware of an ABI where the four bytes past the end of this stack variable could be unreadable, which is not to claim I'm well-read on the topic. We should fix this in due course on code hygiene grounds, but I would not back-patch it. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Alvaro Herrera escribió: Andres Freund escribió: This seems simple to handle by adding the check you propose to the loop. Basically if the xmax doesn't match the xmin, we reached the end, there's nothing more to lock and we can return success without any further work: As mentioned in the thread for bug #8434, the complete working patch for this is attached. b) Check whether a chain element actually aborted - currently we're only doing that in the HEAP_KEYS_UPDATED updated case, but that seems wrong (we can't check for committed tho!). Let me point out that this is exactly the same code that would be affected by my proposed fix for #8434, which would have this check the updateXid in all cases, not only in KEYS_UPDATED as currently. I posted a patch for this problem in the thread about #8434. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 1a4cadc729e724fa1aa6f260a4988b3615fccd1b Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Wed Nov 27 12:12:42 2013 -0300 Compare new tuple's Xmin to previous Xmax while following an update chain Not doing so causes us to traverse an update chain that has been broken by concurrent page pruning. All other code that traverses update chains is careful to do this check, so replicate it here too. Failure to do so leads to erroneous CLOG, subtrans or multixact lookups. Per bug report by J Smith in cadfupgc5bmtv-yg9znxv-vcfkb+jprqs7m2oesqxam_4z1j...@mail.gmail.com diagnosed by Andres Freund. diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c3b2108..5eb45ff 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -4819,6 +4819,7 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, old_infomask; TransactionId xmax, new_xmax; + TransactionId priorXmax = InvalidTransactionId; ItemPointerCopy(tid, tupid); @@ -4844,6 +4845,18 @@ l4: CHECK_FOR_INTERRUPTS(); LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); + /* + * Check the tuple XMIN against prior XMAX, if any. If we reached + * the end of the chain, we're done, so return success. + */ + if (TransactionIdIsValid(priorXmax) + !TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data), + priorXmax)) + { + UnlockReleaseBuffer(buf); + return HeapTupleMayBeUpdated; + } + old_infomask = mytup.t_data-t_infomask; xmax = HeapTupleHeaderGetRawXmax(mytup.t_data); @@ -4944,6 +4957,7 @@ l4: } /* tail recursion */ + priorXmax = HeapTupleHeaderGetUpdateXid(mytup.t_data); ItemPointerCopy((mytup.t_data-t_ctid), tupid); UnlockReleaseBuffer(buf); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Shigeru Hanada shigeru.han...@gmail.com writes: I'm sorry but I don't see the point here. Do you mean that user executes CREATE FOREIGN VIEW in advance and uses the view in a Yes that's what I mean. I think it's nice to support executing ad-hoc remote query written in the syntax which is valid only on remote data source through FDW, and at the moment dblink interface seems feasible for that purpose. I guess the view query would have to be validated by the FDW, which would just receive a text. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock on object is already held
Daniel Wood dw...@salesforce.com writes: Does the original version of my stress test not repro the problem on 9.2? [ tries it ... ] No, it doesn't, or at least the MTBF is a couple orders of magnitude better than on 9.3. Another odd thing (seen with my short version as well as your original) is that 9.3/HEAD run the test case enormously faster than 9.2 and 9.1 do. The older versions seem to spend a lot of time sleeping, which I don't understand. Why does LockAcquireExtended() test for nLocks == 0 in the if (dontWait) block before calling RemoveLocalLock()? Looks like a useless test to me --- we wouldn't be here at all if nLocks had been positive to start with, and there's nothing in between that could raise the count. On the other hand, removing a LOCALLOCK that did have positive count would be disastrous. Maybe what would be more appropriate is an Assert(nLocks == 0) in RemoveLocalLock(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
I guess the view query would have to be validated by the FDW, which would just receive a text. +1 This is not exactly in context, but I and David Fetter discussed recently how we could do similar thing. This would work,but how can we do it for FDWs which do not parse SQL? Am I missing something here? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Atri Sharma atri.j...@gmail.com writes: This would work,but how can we do it for FDWs which do not parse SQL? Am I missing something here? Worst case: CREATE FOREIGN VIEW foo AS $$ whatever syntax is accepted on the other side $$; Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Nov 27, 2013 at 11:08 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Atri Sharma atri.j...@gmail.com writes: This would work,but how can we do it for FDWs which do not parse SQL? Am I missing something here? Worst case: CREATE FOREIGN VIEW foo AS $$ whatever syntax is accepted on the other side That doesnt sound like a very good idea. Can we add a function to the FDW API to define a SQL to foreign server side conversion? I am just musing though. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Handling GIN incomplete splits
On Wed, Nov 13, 2013 at 8:49 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's another part of my crusade against xlog cleanup routines. This series of patches gets rid of the gin_cleanup() function, which is currently used to finish splits of GIN b-tree pages, if the system crashes (or an error occurs) between splitting a page and inserting its downlink to the parent. The first three patches just move code around. IMHO they make the code more readable, so they should be committed in any case. The meat is in the fourth patch. Thoughts, objections? Alexander, I'm sorry if this conflicts with your GIN patches. Feel free to post the latest versions of your patches against the current master, ignoring patches. I can fix the bitrot. That said, I think these refactorings will make your code look a little bit nicer too, so you might want to rebase because of that anyway. Hi Heikki, The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the LWLock code during the operation below, with it trying to take an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the same lockid. CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes) WITH (FASTUPDATE=OFF); It happens pretty reliably using osm2pgsql. I will try to come up with a simple reproducible demonstration, and stack trace, over the weekend. Cheers, Jeff
Re: [HACKERS] Handling GIN incomplete splits
On 11/22/13 15:04, Michael Paquier wrote: 2) post recovery cleanup: - OK, so roughly the soul of this patch is to change the update mechanism for a left child gin page so as the parent split is always done first before any new data is inserted in this child. And this ensures that we can remove the xlog cleanup mechanism for gin page splits in the same fashion as gist... xlog redo mechanism is then adapted according to that. - I did some tests with the patch: -- Index creation time vanilla: 3266.834 with the two patches: 3412.473 ms Hmm. I didn't expect any change in index creation time. Is that repeatable, or within the margin of error? 2-1) In ginFindParents, is the case where the stack has no parent possible (aka the stack is the root itself)? Shouldn't this code path check if root is NULL or not? No. A root split doesn't need to find the parent (because there isn't any), so we never call ginFindParents with a stack containing just the root. If you look at the only caller of ginFindParents, it's quite clear that stack-parent always exists. 2-2) Not sure that this structure is in-line with the project policy: struct { BlockNumber left; BlockNumber right; } children; Why not adding a complementary structure in gin_private.h doing that? It could be used as well in ginxlogSplit to specify a left/right family of block numbers. I don't think we have a project policy against in-line structs. Might be better style to not do it, anyway, though. Come to think of it, that children variable mustn't be declared inside the if-statement; it's no longer in scope when the XLogInsert was done, so the children was pointing to potentially uninitialized piece of stack. Valgrind would've complained. I ended up using BlockIdData[2] for that. Committed, thanks for the review! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Platform-dependent(?) failure in timeout handling
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: 3. Establish a coding rule that if you catch an error with PG_TRY() and don't re-throw it, you have to unblock signals in your PG_CATCH block. Could that be done in the PG_END_TRY macro? Interesting idea ... [ thinks for a bit ... ] but I'm not sure it's really a great fix. PG_END_TRY would have to unblock *all* signals, I think, and it's not clear you want that in that mechanism. Consider a PG_TRY executing inside a signal handler. It would be semantically reasonable to use sigsetjmp(foo,1) in PG_TRY, but as I said, I'm trying to avoid that on performance grounds. In any case, after sleeping on it I've realized that the HOLD_INTERRUPTS dance I proposed adding to handle_sig_alarm() is necessary but not sufficient to deal with interruptions from SIGINT. The real reason that we need HOLD_INTERRUPTS there is that the signal handler may be executing while ImmediateInterruptsOK is true. So if a random SIGINT happened to arrive while we're updating the timeout data structures, we could lose control and leave partially updated (ie, totally corrupted) timeout info. Adding HOLD_INTERRUPTS prevents that case. But, imagine that SIGINT happens just as we enter handle_sig_alarm(), before we can do HOLD_INTERRUPTS. Then the SIGINT handler could longjmp out of the SIGALRM handler; the data structures are OK, but we've effectively lost that interrupt, and no new one is scheduled. This isn't much of a problem for the existing timer event handlers, because we'd just cancel them anyway on the way out of the (sub)transaction, cf LockErrorCleanup. But it could be a problem for future usages of the timer infrastructure. In the case where we longjmp all the way out to the idle loop, we cancel all pending timeouts anyway in postgres.c's sigsetjmp cleanup stanza. It would be reasonable to add a PG_SETMASK(UnBlockSig) in the same area. I am thinking that what we want to do is add a signal unblock step to (sub)transaction abort, too, as well as a call to timeout.c to let it re-establish its timer interrupt in case one got lost. This would fix the problem as long as you assume that anyone catching an arbitrary error condition will do a subtransaction abort to get back into a good state. But we're requiring that already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
Boszormenyi Zoltan z...@cybertec.at writes: 2013-11-23 22:01 keltezéssel, Tom Lane írta: Boszormenyi Zoltan z...@cybertec.at writes: Attached is the patch that modified the command tag returned by the DECLARE CURSOR command. It returns DECLARE SCROLL CURSOR or DECLARE NO SCROLL CURSOR depending on the cursor's scrollable flag that can be determined internally even if neither is asked explicitly. This does not strike me as an acceptable change. It will break any code that's expecting the existing command tag, for little or no benefit to most applications. Even if backwards compatibility were of no concern, I'm not convinced it's a good thing to expose the backend's internal choices about query plans used for cursors, which is what this is basically doing. I saw code in the backend allowing a cursor to be scrollable, although it was not declared as such. How about ripping that out? That also fails the unnecessary-backwards-compatibility-break test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Wed, Nov 27, 2013 at 1:09 AM, Peter Geoghegan p...@heroku.com wrote: Since it took me a relatively long time to recreate this, it may not be trivial to do so. Unless you don't think it's useful to do so, I'm going to give this test a full 24 hours, just in case it shows up anything else like this. I see a further, distinct error message this morning: ERROR: unrecognized heap_lock_tuple status: 1 This is a would-be attempted to lock invisible tuple error, but with the error raised by some heap_lock_tuple() call site, unlike the previous situation where heap_lock_tuple() raised the error directly. Since with the most recent revision, we handle this (newly possible) return code in the new ExecLockHeapTupleForUpdateSpec() function, that just leaves EvalPlanQualFetch() as a plausible place to see it, given the codepaths exercised in the test case. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Nov 27, 2013 at 10:29:34AM -0300, Alvaro Herrera wrote: Shigeru Hanada escribió: SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. WITH, or SRF, or whatever, the point is that we need to be able to specify what we're sending--probably single opaque strings delimited just as we do other strings--and what we might get back--errors only, rows, [sets of] refcursors are the ones I can think of offhand. What we can't do is assume that our parser needs to, or even could, in principle, understand these things in more detail than that. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should we improve documentation on isolation levels?
I am not sure if i am posting in the right place - correct me if I am wrong. The following is not precise: 13.2.1. Read Committed Isolation Level Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. I think it should be re-worded as follows Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit after the first SELECT starts, and before the second SELECT starts. The reason: there could be other DML running between these two SELECTs. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Should-we-improve-documentation-on-isolation-levels-tp5780629.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
Noah Misch n...@leadboat.com wrote: (Kevin, I saw no attachment.) Apologies. Trying again. The threat is that rounding the read size up to the next MAXALIGN would cross into an unreadable memory page, resulting in a SIGSEGV. Every palloc chunk has MAXALIGN'd size under the hood, so the excess read of toDelete cannot cause a SIGSEGV. For a stack variable, it depends on the ABI. I'm not aware of an ABI where the four bytes past the end of this stack variable could be unreadable, which is not to claim I'm well-read on the topic. We should fix this in due course on code hygiene grounds, but I would not back-patch it. If you're sure. I hadn't worked through the code, but had two concerns (neither of which was about a SEGSEGV): (1) That multiple MAXALIGNs of shorter values could push the structure into overlap with the next thing on the stack, allowing one or the other to get stepped on. (2) That the CRC calculation might picking up uninitialized data which was not actually going to match what was used during recovery, leading to end of recovery on replay. If you are confident that neither of these is a real risk, I'll relax about this. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company= ==7232==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffbcb2e834 at pc 0xef0335 bp 0x7fffbcb29b30 sp 0x7fffbcb29b28 READ of size 1 at 0x7fffbcb2e834 thread T0 #0 0xef0334 in XLogInsert /home/kgrittn/pg/master/src/backend/access/transam/xlog.c:1040 #1 0xd53792 in doPickSplit /home/kgrittn/pg/master/src/backend/access/spgist/spgdoinsert.c:1391 #2 0xd0d9f1 in spgdoinsert /home/kgrittn/pg/master/src/backend/access/spgist/spgdoinsert.c:2008 #3 0xcbb4d1 in spginsert /home/kgrittn/pg/master/src/backend/access/spgist/spginsert.c:238 #4 0x46f9de3 in FunctionCall6Coll /home/kgrittn/pg/master/src/backend/utils/fmgr/fmgr.c:1436 #5 0xad43fb in index_insert /home/kgrittn/pg/master/src/backend/access/index/indexam.c:223 #6 0x2122fcd in ExecInsertIndexTuples /home/kgrittn/pg/master/src/backend/executor/execUtils.c:1104 #7 0x228413f in ExecInsert /home/kgrittn/pg/master/src/backend/executor/nodeModifyTable.c:274 #8 0x227fba8 in ExecModifyTable /home/kgrittn/pg/master/src/backend/executor/nodeModifyTable.c:1014 #9 0x2026b03 in ExecProcNode /home/kgrittn/pg/master/src/backend/executor/execProcnode.c:377 #10 0x1fef534 in ExecutePlan /home/kgrittn/pg/master/src/backend/executor/execMain.c:1474 #11 0x1fee488 in standard_ExecutorRun /home/kgrittn/pg/master/src/backend/executor/execMain.c:308 #12 0x1fec7e9 in ExecutorRun /home/kgrittn/pg/master/src/backend/executor/execMain.c:256 #13 0x34f05ab in ProcessQuery /home/kgrittn/pg/master/src/backend/tcop/pquery.c:185 #14 0x34e8c3a in PortalRunMulti /home/kgrittn/pg/master/src/backend/tcop/pquery.c:1279 #15 0x34e1b9c in PortalRun /home/kgrittn/pg/master/src/backend/tcop/pquery.c:816 #16 0x34b6721 in exec_simple_query /home/kgrittn/pg/master/src/backend/tcop/postgres.c:1054 #17 0x34b1420 in PostgresMain /home/kgrittn/pg/master/src/backend/tcop/postgres.c:3998 #18 0x2f1f925 in BackendRun /home/kgrittn/pg/master/src/backend/postmaster/postmaster.c:4085 #19 0x2f1b830 in BackendStartup /home/kgrittn/pg/master/src/backend/postmaster/postmaster.c:3774 #20 0x2efcc96 in ServerLoop /home/kgrittn/pg/master/src/backend/postmaster/postmaster.c:1585 #21 0x2ef13ee in PostmasterMain /home/kgrittn/pg/master/src/backend/postmaster/postmaster.c:1240 #22 0x24dc3c3 in main /home/kgrittn/pg/master/src/backend/main/main.c:196 #23 0x2b89d526e76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 #24 0x4dc3cc in _start ??:? Address 0x7fffbcb2e834 is located in stack of thread T0 at offset 2644 in frame #0 0xd2b5ef in doPickSplit /home/kgrittn/pg/master/src/backend/access/spgist/spgdoinsert.c:680 This frame has 57 object(s): [32, 40) '' [96, 104) '' [160, 168) '' [224, 232) '' [288, 296) '' [352, 356) '' [416, 417) '' [480, 481) '' [544, 545) 'insertedNew' [608, 632) 'in' [672, 720) 'out' [768, 776) 'procinfo' [832, 833) 'includeNew' [896, 900) 'i' [960, 964) 'max' [1024, 1028) 'n' [1088, 1096) 'innerTuple' [1152, 1160) 'node' [1216, 1224) 'nodes' [1280, 1284) 'newInnerBuffer' [1344, 1348) 'newLeafBuffer' [1408, 1416) 'heapPtrs' [1472, 1480) 'leafPageSelect' [1536, 1544) 'leafSizes' [1600, 1608) 'toDelete' [1664, 1672) 'toInsert' [1728, 1730) 'redirectTuplePos' [1792, 1796) 'startOffsets' [1856, 1864) 'newLeafs' [1920, 1924) 'spaceToDelete' [1984, 1988) 'currentFreeSpace' [2048, 2052) 'totalLeafSizes' [2112, 2113) 'allTheSame' [2176, 2496) 'rdata' [2528, 2532) 'nRdata' [2592, 2644) 'xlrec' == Memory access
Re: [HACKERS] logical changeset generation v6.7
On Thu, Nov 14, 2013 at 11:46 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-12 18:50:33 +0100, Andres Freund wrote: You've actually changed the meaning of this section (and not in a good way): be set at server start. varnamewal_level/ must be set -to literalarchive/ or literalhot_standby/ to allow -connections from standby servers. +to literalarchive/, literalhot_standby/ or literallogical/ +to allow connections from standby servers. I think that the previous text meant that you needed archive - or, if you want to allow connections, hot_standby. The new text loses that nuance. Yea, that's because it was lost on me in the first place... I think that's because the nuance isn't actually in the text - note that it is talking about max_wal_senders and talking about connections *from*, not *to* standby servers. I've reformulated the wal_level paragraph and used or higher in several places now. Ok, so here's a rebased version of this. I tried to fix all the issues you mentioned, and it's based on the split off IsSystemRelation() patch, I've sent yesterday (included here). Hello, I'm trying to apply the patches but show some warnings/errors: $ gunzip -c /home/fabrizio/Downloads/0002-wal_decoding-Add-wal_level-logical-and-log-data-requ.patch.gz | git apply - warning: src/backend/access/transam/xlog.c has type 100755, expected 100644 $ gunzip -c /home/fabrizio/Downloads/0005-wal_decoding-Introduce-wal-decoding-via-catalog-time.patch.gz | git apply - warning: src/backend/access/transam/xlog.c has type 100755, expected 100644 $ gunzip -c /home/fabrizio/Downloads/0006-wal_decoding-Implement-VACUUM-FULL-CLUSTER-support-v.patch.gz | git apply - warning: src/backend/access/transam/xlog.c has type 100755, expected 100644 $ gunzip -c /home/fabrizio/Downloads/0007-wal_decoding-Only-peg-the-xmin-horizon-for-catalog-t.patch.gz | git apply - warning: src/backend/access/transam/xlog.c has type 100755, expected 100644 $ gunzip -c /home/fabrizio/Downloads/0011-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz | git apply - error: patch failed: src/bin/pg_basebackup/streamutil.c:210 error: src/bin/pg_basebackup/streamutil.c: patch does not apply The others are applied correctly. The permission warning must be fixed and 0011 bust be rebased. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
2013-11-27 19:16 keltezéssel, Tom Lane írta: Boszormenyi Zoltan z...@cybertec.at writes: 2013-11-23 22:01 keltezéssel, Tom Lane írta: Boszormenyi Zoltan z...@cybertec.at writes: Attached is the patch that modified the command tag returned by the DECLARE CURSOR command. It returns DECLARE SCROLL CURSOR or DECLARE NO SCROLL CURSOR depending on the cursor's scrollable flag that can be determined internally even if neither is asked explicitly. This does not strike me as an acceptable change. It will break any code that's expecting the existing command tag, for little or no benefit to most applications. Even if backwards compatibility were of no concern, I'm not convinced it's a good thing to expose the backend's internal choices about query plans used for cursors, which is what this is basically doing. I saw code in the backend allowing a cursor to be scrollable, although it was not declared as such. How about ripping that out? That also fails the unnecessary-backwards-compatibility-break test. If you read the rest of the mail, it turns out it wasn't a serious question. Getting the SCROLL / NO SCROLL flags from the preprocessor is no problem. The only problem is when it's unspecified. Treating it as NO SCROLL (or adding it to the DECLARE command behind the application's back) would break apps that want to scroll backward. (Not ABSOLUTE.) On the other hand, what problems would one face when adding SCROLL implicitly if it's unspecified? It's not a workable solution either, see below. As the documentation suggests, an application that wants to use UPDATE/DELETE WHERE CURRENT OF ..., it's highly recommended that the cursor is for a FOR UPDATE query. Watch this scenario: zozo= begin; BEGIN zozo= declare mycur cursor for select * from t1 for update; DECLARE CURSOR zozo= fetch from mycur; id | t +--- 1 | a (1 row) zozo= fetch from mycur; id | t +--- 2 | b (1 row) zozo= update t1 set t=t||'_x' where current of mycur; UPDATE 1 zozo= fetch from mycur; id | t +--- 3 | c (1 row) zozo= delete from t1 where current of mycur; DELETE 1 zozo= move absolute 0 in mycur; MOVE 0 zozo= fetch from mycur; id | t +--- 1 | a (1 row) zozo= fetch from mycur; id | t +--- (0 rows) Although the server complains about MOVE BACKWARD, it's not complaining about MOVE ABSOLUTE, despite it's clearly moving backward. The cursor position is tracked in the backend in a long variable and it's not overflowed. This is also legacy behaviour, changing it would break backward compatibility. The other problem I see is with the documentation: it says that the INSENSITIVE keyword is just a placeholder, all cursors are insensitive. It's clearly false. Moving back to the start, previously existing rows won't show up again. It's not strictly a sensitive cursor, either, because the row with id=2 would show up with the new value of t. This neither sensitive nor insensitive behaviour is what the SQL standard calls an asensitive cursor. It would worth a doc change. This is what's written in 9.3: If the cursor's query includes FOR UPDATE or FOR SHARE, then returned rows are locked at the time they are first fetched, in the same way as for a regular SELECT http://www.postgresql.org/docs/9.3/interactive/sql-select.html command with these options. In addition, the returned rows will be the most up-to-date versions; therefore these options provide the equivalent of what the SQL standard calls a sensitive cursor. (Specifying INSENSITIVE together with FOR UPDATE or FOR SHARE is an error.) ( http://www.postgresql.org/docs/9.3/interactive/sql-declare.html ) However, if the cursor is declared without FOR UPDATE, both the explicit SCROLL keyword (or implicit, if the query is simple), scrolling backward and DML with WHERE CURRENT OF are allowed. In this case, the cursor is really insensitive, FETCH statements after MOVE ABSOLUTE 0 return all rows with their original data. This is just to show that adding SCROLL behind the application's back is also pointless. If the query (which can also be a prepared statement in ECPG) contains FOR UPDATE, adding SCROLL to the DECLARE statement would make it fail. If you consider all these: - certain combinations of query and DECLARE stmt flags fail; - adding NO SCROLL is breaking backward compatibility; - the readahead code has to really know whether the cursor is scrollable so it can behave just like the server; then returning the SCROLL / NO SCROLL flag in the command tag is not a bad solution in my view. In fact, this was the only workable solution I could come up with to make it work reliably when neither SCROLL nor NO SCROLL is specified by the application. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Re: [HACKERS] Should we improve documentation on isolation levels?
AK alk...@gmail.com wrote: The following is not precise: 13.2.1. Read Committed Isolation Level Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. I think it should be re-worded as follows Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit after the first SELECT starts, and before the second SELECT starts. The reason: there could be other DML running between these two SELECTs. That's a fair point. I'll commit something to the master branch. I don't think this rises to the level of a documentation bug that needs to be back-patched. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
Boszormenyi Zoltan escribió: If you consider all these: - certain combinations of query and DECLARE stmt flags fail; - adding NO SCROLL is breaking backward compatibility; - the readahead code has to really know whether the cursor is scrollable so it can behave just like the server; then returning the SCROLL / NO SCROLL flag in the command tag is not a bad solution in my view. In fact, this was the only workable solution I could come up with to make it work reliably when neither SCROLL nor NO SCROLL is specified by the application. Would it work to have a function of some sort to which you give a cursor name and it returns whether it is scrollable or not? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
How would you characterize the chances of this happening with default *vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter this bug during each of ~10 generations of autovacuum_freeze_max_age before the old rows actually become invisible. On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote: With regard to fixing things up, ISTM the best bet is heap_prune_chain() so far. That's executed b vacuum and by opportunistic pruning and we know we have the appropriate locks there. Looks relatively easy to fix up things there. Not sure if there are any possible routes to WAL log this but using log_newpage()? I am really not sure what the best course of action is :( Maximizing detection is valuable, and the prognosis for automated repair is poor. I would want a way to extract tuples having xmin outside the range of CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page. At first, I supposed we could offer a tool to blindly freeze such tuples. However, there's no guarantee that they are in harmony with recent changes to the database; transactions that wrongly considered those tuples invisible may have made decisions incompatible with their existence. For example, reviving such a tuple could violate a UNIQUE constraint if the user had already replaced the missing row manually. A module that offers SELECT * FROM rows_wrongly_invisible('anytable') would aid manual cleanup efforts. freeze_if_wrongly_invisible(tid) would be useful, too. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we improve documentation on isolation levels?
I concur - the documentation is not incorrect, it is just incomplete. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Should-we-improve-documentation-on-isolation-levels-tp5780629p5780636.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Thu, Nov 28, 2013 at 12:54 AM, David Fetter da...@fetter.org wrote: On Wed, Nov 27, 2013 at 10:29:34AM -0300, Alvaro Herrera wrote: Shigeru Hanada escribió: SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. WITH, or SRF, or whatever, the point is that we need to be able to specify what we're sending--probably single opaque strings delimited just as we do other strings--and what we might get back--errors only, rows, [sets of] refcursors are the ones I can think of offhand. +1 The input-output formats need to be defined clearly. How about sending parse trees? Is it even possible? What we can't do is assume that our parser needs to, or even could, in principle, understand these things in more detail than that. Agreed. I wonder if its possible to give this task to the FDW implementing authority instead, and get FDW to translate to the required format. -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
On Wed, Nov 27, 2013 at 11:38:23AM -0800, Kevin Grittner wrote: Noah Misch n...@leadboat.com wrote: The threat is that rounding the read size up to the next MAXALIGN would cross into an unreadable memory page, resulting in a SIGSEGV. Every palloc chunk has MAXALIGN'd size under the hood, so the excess read of toDelete cannot cause a SIGSEGV. For a stack variable, it depends on the ABI. I'm not aware of an ABI where the four bytes past the end of this stack variable could be unreadable, which is not to claim I'm well-read on the topic. We should fix this in due course on code hygiene grounds, but I would not back-patch it. If you're sure. I hadn't worked through the code, but had two concerns (neither of which was about a SEGSEGV): (1) That multiple MAXALIGNs of shorter values could push the structure into overlap with the next thing on the stack, allowing one or the other to get stepped on. These are out-of-bounds reads only, not writes. Also, the excess doesn't accumulate that way; the code reads beyond any particular stack variable or palloc chunk by no more than 7 bytes. (2) That the CRC calculation might picking up uninitialized data which was not actually going to match what was used during recovery, leading to end of recovery on replay. The CRC calculation does pick up unspecified bytes, but we copy those same bytes into the actual WAL record. The effect is similar to that of unspecified pad bytes in the middle of xlrec structures. If you are confident that neither of these is a real risk, I'll relax about this. If there is a real risk, I'm not seeing it. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we improve documentation on isolation levels?
AK alk...@gmail.com wrote: the documentation is not incorrect, it is just incomplete. Yeah, that's exactly how I saw it. :-) Docs changed on master only. Thanks for the report! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
Boszormenyi Zoltan z...@cybertec.at writes: If you consider all these: - certain combinations of query and DECLARE stmt flags fail; - adding NO SCROLL is breaking backward compatibility; - the readahead code has to really know whether the cursor is scrollable so it can behave just like the server; If you're claiming that readahead inside ECPG will behave absolutely transparently in all cases, I think that's bogus anyway. Consider for example a query that will get a divide-by-zero error when it computes the 110th row --- but the application stops after fetching 100 rows. Everything's fine, until you insert some readahead logic. Queries containing volatile functions might also not be happy about readahead. Given these considerations, I think it'd be better to allow explicit application control over whether read-ahead happens for a particular query. And I have no problem whatsoever with requiring that the cursor be explicitly marked SCROLL or NO SCROLL before read-ahead will occur. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bytea_ouput = escape vs encode(byte, 'escape')
I'm wondering why bytes_output = escape produces different output than encode(byte, 'escape') does. Is this intentional? If so, why? cnuapp_prod@postgres=# select e'\r'::bytea AS cr, e'\n'::bytea AS lf; cr | lf --+-- \x0d | \x0a (1 row) cnuapp_prod@postgres=# set bytea_output = escape; SET cnuapp_prod@postgres=# select e'\r'::bytea AS cr, e'\n'::bytea AS lf; cr | lf --+-- \015 | \012 (1 row) cnuapp_prod@postgres=# select encode(e'\r'::bytea,'escape') AS cr, encode(e'\n'::bytea, 'escape') AS lf; cr | lf + \r | + | (1 row) cnuapp_prod@postgres=# -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Tue, Nov 26, 2013 at 5:02 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Nov 26, 2013 at 01:58:04PM -0300, Alvaro Herrera wrote: Bruce Momjian escribió: On Tue, Nov 26, 2013 at 11:22:39AM -0500, Tom Lane wrote: Uh, I ended up mentioning no effect to highlight it does nothing, rather than mention a warning. Would people prefer I say warning? Or should I say issues a warning because it has no effect or something? It is easy to change. I'd revert the change Robert highlights above. ISTM you've changed the code to match the documentation; why would you then change the docs? Well, I did it to make it consistent. The question is what to write for _all_ of the new warnings, including SET. Do we say warning, do we say it has no effect, or do we say both? The ABORT is a just one case of that. Maybe it emits a warning and otherwise has no effect? Emitting a warning is certainly not doing nothing; as has been pointed out in the SSL renegotiation thread, it might cause the log to fill disk. OK, doc patch attached. Seems broadly reasonable, but I'd use no other effect throughout. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Thu, Nov 28, 2013 at 01:29:46AM +0530, Atri Sharma wrote: On Thu, Nov 28, 2013 at 12:54 AM, David Fetter da...@fetter.org wrote: On Wed, Nov 27, 2013 at 10:29:34AM -0300, Alvaro Herrera wrote: Shigeru Hanada escribió: SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. Seems to me that if you want to read remote tables without creating a foreign table, you could define them locally using something like the WITH syntax and then use them normally in the rest of the query. WITH, or SRF, or whatever, the point is that we need to be able to specify what we're sending--probably single opaque strings delimited just as we do other strings--and what we might get back--errors only, rows, [sets of] refcursors are the ones I can think of offhand. +1 The input-output formats need to be defined clearly. How about sending parse trees? Is it even possible? I don't see why parse trees wouldn't be something that could eventually be sent to other PostgreSQL servers, but I see that whole discussion as orthogonal to this one. My point here is that there needs to be an escape to native system available in SQL so people can communicate directly with the remote systems in the systems' own languages. It's a little bit analogous to making assembler available from C, or C from HLLs. What we can't do is assume that our parser needs to, or even could, in principle, understand these things in more detail than that. Agreed. I wonder if its possible to give this task to the FDW implementing authority instead, and get FDW to translate to the required format. I don't know that the FDW would necessarily need to get involved except in the sense of full recognition before processing. http://langsec.org/occupy/ Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bytea_ouput = escape vs encode(byte, 'escape')
Jim Nasby-2 wrote I'm wondering why bytes_output = escape produces different output than encode(byte, 'escape') does. Is this intentional? If so, why? cnuapp_prod@postgres=# select e'\r'::bytea AS cr, e'\n'::bytea AS lf; cr | lf --+-- \x0d | \x0a (1 row) cnuapp_prod@postgres=# set bytea_output = escape; SET cnuapp_prod@postgres=# select e'\r'::bytea AS cr, e'\n'::bytea AS lf; cr | lf --+-- \015 | \012 (1 row) cnuapp_prod@postgres=# select encode(e'\r'::bytea,'escape') AS cr, encode(e'\n'::bytea, 'escape') AS lf; cr | lf + \r | + | (1 row) cnuapp_prod@postgres=# encode takes a bytea and provides what it would be as a text (using the specified encoding to perform the conversion). the bytea output examples are simple output of the contents of the byte-array without an supposition as to what those bytes represent. It is strictly a serialization format and not an encoding/decoding of the contents. In this example the two functions are acting as paired input/output. I'm thinking the direction you are assuming from the word encode is confusing you - as it did me at first. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/bytea-ouput-escape-vs-encode-byte-escape-tp5780643p5780647.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
On 11/27/13, 2:49 PM, Alvaro Herrera wrote: Would it work to have a function of some sort to which you give a cursor name and it returns whether it is scrollable or not? That might make sense. I think this case is similar to the question whether a view is updatable. You wouldn't put that information in the CREATE VIEW command tag. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
On 11/27/13, 3:47 PM, Tom Lane wrote: Given these considerations, I think it'd be better to allow explicit application control over whether read-ahead happens for a particular query. And I have no problem whatsoever with requiring that the cursor be explicitly marked SCROLL or NO SCROLL before read-ahead will occur. Well, technically, unspecified means NO SCROLL according to the SQL standard. A lot of applications in ECPG are ported from other systems, which might make that assumption. It wouldn't be very nice to have to change all that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL
On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote: Amit Kapila wrote: On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote: Surely VACUUM FULL should rebuild the visibility map, and make tuples in the new relation all-visible, no? Certainly it seems odd to me that VACUUM FULL leaves the the table in a less-well maintained state in terms of visibility than a normal vacuum. VACUUM FULL should not need to be followed by another VACUUM. I think it cannot made all visible. I don't think all tuples in the relation are necessarily visible to all transactions, but the ones which are should probably be flagged that way. I have developed the attached proof-of-concept patch to fix the problem of having no visibility map after CLUSTER or VACUUM FULL. I tested with these queries: CREATE TABLE test(x INT PRIMARY KEY); INSERT INTO test VALUES (1); VACUUM FULL test; -- or CLUSTER SELECT relfilenode FROM pg_class WHERE relname = 'test'; relfilenode - 16399 Then 'ls -l data/base/16384/16399*' to see the *_vm file. I am not sure how to test that the vm contents are valid. This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior does not do writes through the shared buffer cache, as outlined in this C comment block: * We can't use the normal heap_insert function to insert into the new * heap, because heap_insert overwrites the visibility information. * We use a special-purpose raw_heap_insert function instead, which * is optimized for bulk inserting a lot of tuples, knowing that we have * exclusive access to the heap. raw_heap_insert builds new pages in * local storage. When a page is full, or at the end of the process, * we insert it to WAL as a single record and then write it to disk * directly through smgr. Note, however, that any data sent to the new * heap's TOAST table will go through the normal bufmgr. I originally tried to do this higher up in the stack but ran into problems because I couldn't access the new heap page so I had to do it at the non-shared-buffer page level. I reused the lazy vacuum routines. I need to know this is the right approach, and need to know what things are wrong or missing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c new file mode 100644 index 951894c..c01a6a8 *** a/src/backend/access/heap/rewriteheap.c --- b/src/backend/access/heap/rewriteheap.c *** *** 107,112 --- 107,114 #include access/rewriteheap.h #include access/transam.h #include access/tuptoaster.h + #include access/visibilitymap.h + #include commands/vacuum.h #include storage/bufmgr.h #include storage/smgr.h #include utils/memutils.h *** typedef OldToNewMappingData *OldToNewMap *** 172,177 --- 174,180 /* prototypes for internal functions */ static void raw_heap_insert(RewriteState state, HeapTuple tup); + static void update_page_vm(Relation relation, Page page, BlockNumber blkno); /* *** end_heap_rewrite(RewriteState state) *** 281,286 --- 284,290 RelationOpenSmgr(state-rs_new_rel); PageSetChecksumInplace(state-rs_buffer, state-rs_blockno); + update_page_vm(state-rs_new_rel, state-rs_buffer, state-rs_blockno); smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno, (char *) state-rs_buffer, true); *** raw_heap_insert(RewriteState state, Heap *** 633,638 --- 637,643 RelationOpenSmgr(state-rs_new_rel); PageSetChecksumInplace(page, state-rs_blockno); + update_page_vm(state-rs_new_rel, page, state-rs_blockno); smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno, (char *) page, true); *** raw_heap_insert(RewriteState state, Heap *** 677,679 --- 682,704 if (heaptup != tup) heap_freetuple(heaptup); } + + static void + update_page_vm(Relation relation, Page page, BlockNumber blkno) + { + Buffer vmbuffer = InvalidBuffer; + TransactionId visibility_cutoff_xid; + + visibilitymap_pin(relation, blkno, vmbuffer); + Assert(BufferIsValid(vmbuffer)); + + if (!visibilitymap_test(relation, blkno, vmbuffer) + heap_page_is_all_visible(relation, InvalidBuffer, page, + visibility_cutoff_xid)) + { + PageSetAllVisible(page); + visibilitymap_set(relation, blkno, InvalidBlockNumber, + InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); + } + ReleaseBuffer(vmbuffer); + } diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c new file mode 100644 index 7f40d89..a42511b *** a/src/backend/access/heap/visibilitymap.c --- b/src/backend/access/heap/visibilitymap.c ***
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Wed, Nov 27, 2013 at 06:05:13AM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: On Tue, Nov 26, 2013 at 03:25:44PM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: How are we handling breakage of pg_dump, not pg_dumpall? That was discussed. Do you have something to add? I am confused what we are patching. Are we patching pg_dump, pg_dumpall, or both? Just pg_dumpall.c. OK, there was a pg_dump patch earlier which we are not using now. Are we propagating other settings from pg_dump to pg_dumpall, like statement_timeout? pg_dumpall output sets up the global objects (including their properties) and then does a \connect to each database, followed by the same output that pg_dump would generate for that database. That includes the SET statements for statement_timeout, etc. The patch does nothing to change what objects or properties the pg_dumpall output tries to set up, it just sets a property *on the current connection* to allow those statements to run without error. What is the logic that has us setting statement_timeout in pg_dump but default_transaction_read_only in pg_dumpall? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 2013-11-27 14:53:27 -0500, Noah Misch wrote: How would you characterize the chances of this happening with default *vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter this bug during each of ~10 generations of autovacuum_freeze_max_age before the old rows actually become invisible. I think realistically, to actually trigger the bug, it needs to happen quite a bit more often. But in some workloads it's pretty darn easy to hit. E.g. if significant parts of the table are regularly deleted, lots, if not most, of your vacuums will spuriously increase relfrozenxid above the actual value. Each time only by a small amount, but due to that small increase there never will be an actual full table vacuum since freeze_table_age will never even remotely be reached. The client that made me look into the issue noticed problems on pg_attribute - presumably because of temporary table usage primarily affecting the tail end of pg_attribute. On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote: With regard to fixing things up, ISTM the best bet is heap_prune_chain() so far. That's executed b vacuum and by opportunistic pruning and we know we have the appropriate locks there. Looks relatively easy to fix up things there. Not sure if there are any possible routes to WAL log this but using log_newpage()? I am really not sure what the best course of action is :( Maximizing detection is valuable, and the prognosis for automated repair is poor. I would want a way to extract tuples having xmin outside the range of CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page. I think the likelihood of the problem affecting !all-visible pages is close to zero. Each vacuum will try to clean those, so they surely will get vacuumed at some point. I think the only way that could happen is if the ConditionalLockBufferForCleanup() fails in each vacuum. And that seems a bit unlikely. At first, I supposed we could offer a tool to blindly freeze such tuples. However, there's no guarantee that they are in harmony with recent changes to the database; transactions that wrongly considered those tuples invisible may have made decisions incompatible with their existence. For example, reviving such a tuple could violate a UNIQUE constraint if the user had already replaced the missing row manually. Good point, although since they are all on all-visible pages sequential scans will currently already find those. It's primarily index scans that won't. So it's not really reviving them... The primary reason why I think it might be a good idea to revive automatically is, that an eventual full-table/freeze vacuum will currently delete them which seems bad. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
On 2013-11-27 15:29:24 -0500, Noah Misch wrote: If you are confident that neither of these is a real risk, I'll relax about this. If there is a real risk, I'm not seeing it. Me neither. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Wed, Nov 27, 2013 at 03:59:31PM -0500, Robert Haas wrote: On Tue, Nov 26, 2013 at 5:02 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Nov 26, 2013 at 01:58:04PM -0300, Alvaro Herrera wrote: Bruce Momjian escribió: On Tue, Nov 26, 2013 at 11:22:39AM -0500, Tom Lane wrote: Uh, I ended up mentioning no effect to highlight it does nothing, rather than mention a warning. Would people prefer I say warning? Or should I say issues a warning because it has no effect or something? It is easy to change. I'd revert the change Robert highlights above. ISTM you've changed the code to match the documentation; why would you then change the docs? Well, I did it to make it consistent. The question is what to write for _all_ of the new warnings, including SET. Do we say warning, do we say it has no effect, or do we say both? The ABORT is a just one case of that. Maybe it emits a warning and otherwise has no effect? Emitting a warning is certainly not doing nothing; as has been pointed out in the SSL renegotiation thread, it might cause the log to fill disk. OK, doc patch attached. Seems broadly reasonable, but I'd use no other effect throughout. That sounds awkward, e.g.: Issuing commandROLLBACK/ outside of a transaction block emits a warning but has no other effect. I could live with this: Issuing commandROLLBACK/ outside of a transaction block has no effect except emitting a warning. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact pessmization in 9.3
Alvaro Herrera wrote: Correct. The only difficulty here is that we would need to pass down the fact that we know for certain that this is only a locking Multixact. There are some callers that go to it indirectly via MultiXactIdWait or MultiXactIdExpand, but now that I look I think it's fine for those to pass false (i.e. assume there might be an update and disable the optimization), since those aren't hot compared to the other cases. This patch implements this idea, but I haven't tested it much beyond compiling and ensuring it passes the existing tests. .. and it turns out it doesn't work. To be really effective, we need MultiXactIdIsRunning to be passed down the flag too, so it can pass it to GetMultiXactIdMembers. One other thought is that MultiXactIdIsRunning and GetMultiXactIdMembers are public functions, so this patch would represent an API change in 9.3. I doubt any external modules would be relying on these functions, but there's so many care and thought put into avoiding API changes on released versions that I'm nervous about doing it here. So I think we'd need to provide a compatibility shim to avoid that. (I generally dislike to keep compatibility stuff forever, so I would provide this backward-compatible functions in 9.3 only. Anyone using it would have to fix the code and recompile for 9.4+. This means a #ifdef in code meant to work on top of both 9.3 and 9.4. Anyone opines otherwise?) The other idea is to just not backpatch this. Other than that, this patch implements the optimization suggested here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/contrib/pgrowlocks/pgrowlocks.c --- b/contrib/pgrowlocks/pgrowlocks.c *** *** 168,174 pgrowlocks(PG_FUNCTION_ARGS) allow_old = !(infomask HEAP_LOCK_MASK) (infomask HEAP_XMAX_LOCK_ONLY); ! nmembers = GetMultiXactIdMembers(xmax, members, allow_old); if (nmembers == -1) { values[Atnum_xids] = {0}; --- 168,175 allow_old = !(infomask HEAP_LOCK_MASK) (infomask HEAP_XMAX_LOCK_ONLY); ! nmembers = GetMultiXactIdMembers(xmax, members, allow_old, ! false); if (nmembers == -1) { values[Atnum_xids] = {0}; *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 3987,3993 l3: * the case, HeapTupleSatisfiesUpdate would have returned * MayBeUpdated and we wouldn't be here. */ ! nmembers = GetMultiXactIdMembers(xwait, members, false); for (i = 0; i nmembers; i++) { --- 3987,3995 * the case, HeapTupleSatisfiesUpdate would have returned * MayBeUpdated and we wouldn't be here. */ ! nmembers = ! GetMultiXactIdMembers(xwait, members, false, ! HEAP_XMAX_IS_LOCKED_ONLY(infomask)); for (i = 0; i nmembers; i++) { *** *** 4008,4014 l3: } } ! pfree(members); } /* --- 4010,4017 } } ! if (members) ! pfree(members); } /* *** *** 4157,4163 l3: * been the case, HeapTupleSatisfiesUpdate would have returned * MayBeUpdated and we wouldn't be here. */ ! nmembers = GetMultiXactIdMembers(xwait, members, false); if (nmembers = 0) { --- 4160,4168 * been the case, HeapTupleSatisfiesUpdate would have returned * MayBeUpdated and we wouldn't be here. */ ! nmembers = ! GetMultiXactIdMembers(xwait, members, false, ! HEAP_XMAX_IS_LOCKED_ONLY(infomask)); if (nmembers = 0) { *** *** 4626,4632 l5: * MultiXactIdExpand if we weren't to do this, so this check is not * incurring extra work anyhow. */ ! if (!MultiXactIdIsRunning(xmax)) { if (HEAP_XMAX_IS_LOCKED_ONLY(old_infomask) || TransactionIdDidAbort(MultiXactIdGetUpdateXid(xmax, --- 4631,4637 * MultiXactIdExpand if we weren't to do this, so this check is not * incurring extra work anyhow. */ ! if (!MultiXactIdIsRunning(xmax, HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))) { if (HEAP_XMAX_IS_LOCKED_ONLY(old_infomask) || TransactionIdDidAbort(MultiXactIdGetUpdateXid(xmax, *** *** 5217,5223 GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask, * We only use this in multis we just created, so they cannot be values * pre-pg_upgrade. */ ! nmembers = GetMultiXactIdMembers(multi, members, false); for (i = 0; i nmembers; i++) { --- 5222,5228 * We only use this in multis we just created, so they cannot be values * pre-pg_upgrade. */ ! nmembers = GetMultiXactIdMembers(multi, members, false, false); for (i = 0; i nmembers; i++) { *** *** 5293,5299 MultiXactIdGetUpdateXid(TransactionId xmax, uint16 t_infomask) * Since we know the LOCK_ONLY
[HACKERS] Another bug introduced by fastpath patch
In LockReleaseAll, we have this coding: for (partition = 0; partition NUM_LOCK_PARTITIONS; partition++) { LWLockIdpartitionLock = FirstLockMgrLock + partition; SHM_QUEUE *procLocks = (MyProc-myProcLocks[partition]); proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, offsetof(PROCLOCK, procLink)); if (!proclock) continue;/* needn't examine this partition */ LWLockAcquire(partitionLock, LW_EXCLUSIVE); ... process the proclock list ... LWLockRelease(partitionLock); }/* loop over partitions */ That is, we check the proclock list head before acquiring the matching partitionLock, and assume we can skip scanning this partition if the list is empty. Once upon a time that was safe, but since the fast-path patch it's not very safe, because some other backend could be in process of promoting one of our fast-path locks (and hence changing the proclock list). If we could be sure that we had removed *all* our fastpath locks in the earlier loop then it'd be okay, but because LockReleaseAll's name is a bit of a misnomer, I don't think we can assume that. The simplest fix is just to move the fetch of the list header down past the LWLockAcquire. This is pretty annoying, though, because in typical small transactions that will require a lot of additional partition lock acquire/releases. I think it might be safe to check the list header, skip the partition if it's null, but if it isn't then acquire the lock *and re-fetch the list header*. The idea here is that if there is someone else busy promoting one of our fastpath locks, it must be a lock we'd decided not to delete in the previous loop, and so we would again decide not to delete it in the main loop. Therefore, it doesn't matter if we decide to skip a partition microseconds before somebody changes the list header from null to not-null. However, once we've decided to process a partition, we'd better get an up-to-date view of the list state. This does do some damage to the original concept that allLocks mode would guarantee to clean up all locks of the target lockmethod even if the locallock table was damaged. I wonder whether we shouldn't get rid of the existing logic about fastpath in the loop over locallocks entirely, and replace it with code that just zeroes the fastpath array when lockmethodid == DEFAULT_LOCKMETHOD and allLocks. That would likely be faster than what you've got here as well as more robust. Or we could add a restriction to EligibleForRelationFastPath that restricts the fast-path mechanism to non-session locks, in which case we'd not need to make the zeroing contingent on allLocks either. I don't think we take any fast-path-eligible locks in session mode anyway, so this wouldn't be giving up any functionality on that end. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Wed, Nov 27, 2013 at 04:44:02PM -0500, Bruce Momjian wrote: I could live with this: Issuing commandROLLBACK/ outside of a transaction block has no effect except emitting a warning. Proposed doc patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/ref/abort.sgml b/doc/src/sgml/ref/abort.sgml new file mode 100644 index f3a2fa8..ce70e7f *** a/doc/src/sgml/ref/abort.sgml --- b/doc/src/sgml/ref/abort.sgml *** ABORT [ WORK | TRANSACTION ] *** 63,69 /para para !Issuing commandABORT/ outside of a transaction block has no effect. /para /refsect1 --- 63,70 /para para !Issuing commandABORT/ outside of a transaction block !has no effect except emitting a warning. /para /refsect1 diff --git a/doc/src/sgml/ref/rollback.sgml b/doc/src/sgml/ref/rollback.sgml new file mode 100644 index 4f79621..3465d51 *** a/doc/src/sgml/ref/rollback.sgml --- b/doc/src/sgml/ref/rollback.sgml *** ROLLBACK [ WORK | TRANSACTION ] *** 60,66 para Issuing commandROLLBACK/ outside of a transaction !block has no effect. /para /refsect1 --- 60,66 para Issuing commandROLLBACK/ outside of a transaction !block has no effect except emitting a warning. /para /refsect1 diff --git a/doc/src/sgml/ref/set.sgml b/doc/src/sgml/ref/set.sgml new file mode 100644 index 5a84f69..6ef06e6 *** a/doc/src/sgml/ref/set.sgml --- b/doc/src/sgml/ref/set.sgml *** SET [ SESSION | LOCAL ] TIME ZONE { rep *** 110,117 para Specifies that the command takes effect for only the current transaction. After commandCOMMIT/ or commandROLLBACK/, ! the session-level setting takes effect again. This has no effect ! outside of a transaction block. /para /listitem /varlistentry --- 110,117 para Specifies that the command takes effect for only the current transaction. After commandCOMMIT/ or commandROLLBACK/, ! the session-level setting takes effect again. Issuing this ! outside of a transaction block has no effect except emitting a warning. /para /listitem /varlistentry diff --git a/doc/src/sgml/ref/set_constraints.sgml b/doc/src/sgml/ref/set_constraints.sgml new file mode 100644 index a33190c..541a50b *** a/doc/src/sgml/ref/set_constraints.sgml --- b/doc/src/sgml/ref/set_constraints.sgml *** SET CONSTRAINTS { ALL | replaceable cla *** 99,105 para This command only alters the behavior of constraints within the !current transaction. This has no effect outside of a transaction block. /para /refsect1 --- 99,106 para This command only alters the behavior of constraints within the !current transaction. Issuing this outside of a transaction block !has no effect except emitting a warning. /para /refsect1 diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml new file mode 100644 index e90ff4a..1597657 *** a/doc/src/sgml/ref/set_transaction.sgml --- b/doc/src/sgml/ref/set_transaction.sgml *** SET SESSION CHARACTERISTICS AS TRANSACTI *** 185,191 para If commandSET TRANSACTION/command is executed without a prior commandSTART TRANSACTION/command or commandBEGIN/command, !it will have no effect. /para para --- 185,191 para If commandSET TRANSACTION/command is executed without a prior commandSTART TRANSACTION/command or commandBEGIN/command, !ithas no effect except emitting a warning. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] overflow checks optimized away
On Mon, Sep 9, 2013 at 12:21:56PM -0400, Robert Haas wrote: On Sat, Sep 7, 2013 at 6:55 PM, Greg Stark st...@mit.edu wrote: Should these patches be applied? I have a copy of the program and was going to take care of this. When? 2.5 months later, status report? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another bug introduced by fastpath patch
On 2013-11-27 17:25:44 -0500, Tom Lane wrote: Or we could add a restriction to EligibleForRelationFastPath that restricts the fast-path mechanism to non-session locks, in which case we'd not need to make the zeroing contingent on allLocks either. I don't think we take any fast-path-eligible locks in session mode anyway, so this wouldn't be giving up any functionality on that end. That seems like the best thing to do to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact pessmization in 9.3
On 2013-11-27 19:24:35 -0300, Alvaro Herrera wrote: One other thought is that MultiXactIdIsRunning and GetMultiXactIdMembers are public functions, so this patch would represent an API change in 9.3. I doubt any external modules would be relying on these functions, but there's so many care and thought put into avoiding API changes on released versions that I'm nervous about doing it here. So I think we'd need to provide a compatibility shim to avoid that. -0.5 for providing compatibility shims for this. There really doesn't seem to be legitimate use for that api outside the guts of heapam.c besides crude debugging hacks. So it seems like wasted effort. (I generally dislike to keep compatibility stuff forever, so I would provide this backward-compatible functions in 9.3 only. Anyone using it would have to fix the code and recompile for 9.4+. This means a #ifdef in code meant to work on top of both 9.3 and 9.4. Anyone opines otherwise?) If at all, we definitely should only do it for 9.3. The other idea is to just not backpatch this. I think backpatching is a good idea, I have seen GetMultiXactIdMembers() + slru code take up 80% cpu in strange workloads. But it possibly might be a good idea to wait till after the next point release to give people at least a minimal chance of catching problems. Other than that, this patch implements the optimization suggested here. Great! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On 2013-11-27 13:57:52 -0300, Alvaro Herrera wrote: Per bug report by J Smith in cadfupgc5bmtv-yg9znxv-vcfkb+jprqs7m2oesqxam_4z1j...@mail.gmail.com diagnosed by Andres Freund. Alvaro, do you see a way this could actually have caused J.'s problems? I thought about a few, but each turned about to not really be possible... I can easily see why we would uselessly wait or worse, but the pg_subtrans errors I can't really explain with it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
Andres Freund and...@2ndquadrant.com wrote: HeapTupleHeaderGetUpdateXid() ignores aborted updaters and returns InvalidTransactionId in that case, but HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS... That sure *sounds* like it should cause a problem for this code in CheckForSerializableConflictOut(): htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer); switch (htsvResult) { [ ... ] case HEAPTUPLE_DELETE_IN_PROGRESS: xid = HeapTupleHeaderGetUpdateXid(tuple-t_data); break; [ ... ] } Assert(TransactionIdIsValid(xid)); ... however, I have not been able to trigger that Assert even with gdb breakpoints at what I think are the right spots. Any suggestions? How far back is it true that the above HeapTupleSatisfiesVacuum() can return HEAPTUPLE_DELETE_IN_PROGRESS but HeapTupleHeaderGetUpdateXid(tuple-t_data) on the exact same tuple structure can return InvalidTransactionId? Is there some other condition (besides a ROLLBACK of an UPDATE on the tuple being read) which needs to be met? Is any particular timing necessary? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Modify the DECLARE CURSOR command tag depending on the scrollable flag
Peter Eisentraut pete...@gmx.net writes: On 11/27/13, 3:47 PM, Tom Lane wrote: Given these considerations, I think it'd be better to allow explicit application control over whether read-ahead happens for a particular query. And I have no problem whatsoever with requiring that the cursor be explicitly marked SCROLL or NO SCROLL before read-ahead will occur. Well, technically, unspecified means NO SCROLL according to the SQL standard. A lot of applications in ECPG are ported from other systems, which might make that assumption. It wouldn't be very nice to have to change all that. Hm. So you're suggesting that ECPG fix this problem by inserting an explicit NO SCROLL clause into translated DECLARE CURSOR commands, if there's not a SCROLL clause? That would solve the problem of the ECPG library not being sure which behavior applies, but it might break existing apps that were unknowingly relying on a simple cursor being scrollable. OTOH any such app would be subject to breakage anyway as a result of planner changes, so it's hard to complain against this, as long as it's happening in a major version update. I'm for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On Wed, Nov 27, 2013 at 10:43:05PM +0100, Andres Freund wrote: On 2013-11-27 14:53:27 -0500, Noah Misch wrote: How would you characterize the chances of this happening with default *vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter this bug during each of ~10 generations of autovacuum_freeze_max_age before the old rows actually become invisible. I think realistically, to actually trigger the bug, it needs to happen quite a bit more often. But in some workloads it's pretty darn easy to hit. E.g. if significant parts of the table are regularly deleted, lots, if not most, of your vacuums will spuriously increase relfrozenxid above the actual value. Each time only by a small amount, but due to that small increase there never will be an actual full table vacuum since freeze_table_age will never even remotely be reached. That makes sense. Maximizing detection is valuable, and the prognosis for automated repair is poor. I would want a way to extract tuples having xmin outside the range of CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page. I think the likelihood of the problem affecting !all-visible pages is close to zero. Each vacuum will try to clean those, so they surely will get vacuumed at some point. I think the only way that could happen is if the ConditionalLockBufferForCleanup() fails in each vacuum. And that seems a bit unlikely. The page could have sat all-visible (through multiple XID epochs, let's say) until a recent UPDATE. At first, I supposed we could offer a tool to blindly freeze such tuples. However, there's no guarantee that they are in harmony with recent changes to the database; transactions that wrongly considered those tuples invisible may have made decisions incompatible with their existence. For example, reviving such a tuple could violate a UNIQUE constraint if the user had already replaced the missing row manually. Good point, although since they are all on all-visible pages sequential scans will currently already find those. It's primarily index scans that won't. So it's not really reviving them... True. Since a dump/reload of the database would already get the duplicate key violation, the revival is not making anything clearly worse. And if we hope for manual repair, many DBAs just won't do that at all. The primary reason why I think it might be a good idea to revive automatically is, that an eventual full-table/freeze vacuum will currently delete them which seems bad. Will it? When the page became all-visible, the tuples were all hinted. They will never be considered dead. Every 2B transactions, they will alternate between live and not-yet-committed. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
On 2013-11-27 15:14:11 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: HeapTupleHeaderGetUpdateXid() ignores aborted updaters and returns InvalidTransactionId in that case, but HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS... That sure *sounds* like it should cause a problem for this code in CheckForSerializableConflictOut(): Yea. IMNSHO the current state is a API design flaw. We really should be returning the aborted xid since that's what happens for non-multixact ids. htsvResult = HeapTupleSatisfiesVacuum(tuple, TransactionXmin, buffer); switch (htsvResult) { [ ... ] case HEAPTUPLE_DELETE_IN_PROGRESS: xid = HeapTupleHeaderGetUpdateXid(tuple-t_data); break; [ ... ] } Assert(TransactionIdIsValid(xid)); ... however, I have not been able to trigger that Assert even with gdb breakpoints at what I think are the right spots. Any suggestions? How far back is it true that the above HeapTupleSatisfiesVacuum() can return HEAPTUPLE_DELETE_IN_PROGRESS but HeapTupleHeaderGetUpdateXid(tuple-t_data) on the exact same tuple structure can return InvalidTransactionId? Is ther What do you mean with how far back? e some other condition (besides a ROLLBACK of an UPDATE on the tuple being read) which needs to be met? Is any particular timing necessary? Afaics you need a multixact consisting out of a) the updater and b) a lock. That's probably easiest to get if you update a row in one session without changing the primary key, and then key-share lock it in another. Or the other way round. Then abort the updater. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact pessmization in 9.3
Andres Freund and...@2ndquadrant.com writes: On 2013-11-27 19:24:35 -0300, Alvaro Herrera wrote: The other idea is to just not backpatch this. I think backpatching is a good idea, I have seen GetMultiXactIdMembers() + slru code take up 80% cpu in strange workloads. But it possibly might be a good idea to wait till after the next point release to give people at least a minimal chance of catching problems. Agreed on both counts. We're close enough now to Monday's wrap that we should avoid anything that risks destabilizing 9.3.x, unless it's to fix a serious bug. AIUI this is just a performance issue, so let's wait till after 9.3.2 is done to push in the fix. But since it is a performance regression from pre-9.3, never back-patching the fix at all isn't very attractive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
Bruce Momjian br...@momjian.us wrote: On Wed, Nov 27, 2013 at 06:05:13AM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: I am confused what we are patching. Are we patching pg_dump, pg_dumpall, or both? Just pg_dumpall.c. OK, there was a pg_dump patch earlier which we are not using now. Right, that was v1, early in the discussion. This is v3, based on responding to the discussion on this thread. What is the logic that has us setting statement_timeout in pg_dump but default_transaction_read_only in pg_dumpall? Of the people who posted on this thread supporting that, I think Tom said it best: Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to agree with Kevin that this behavior is wrong and should be fixed (and back-patched), so far as pg_dumpall is concerned. pg_dumpall's charter is to be able to recreate a database cluster's contents in a virgin installation, but it's failing to honor that contract if the cluster has any ALTER DATABASE SET default_read_only settings. Similarly, I think it's reasonable to try to make pg_upgrade cope with the case. I also agree with *not* changing pg_dump, since it is not the charter of pg_dump to recreate a whole cluster, and the objection about possibly restoring into a database that was meant to be protected by this setting seems to have some force. For example, I have seen dumps accidentally restored to the postgres database on multiple occasions. You might, for example, flag the postgres database with this, and thereby block such accidents. The patch as it stands would allow pg_dumpall to replicate such a cluster, flag and all. Without the patch you get many errors. It is also much easier to work around with pg_dump output. You could get a psql connection to a database, set this off for the connection, and use \i to read the pg_dump output file. Or you could concatenate a SET statement in front of the pg_dump output when piping it in. There is no correspondingly easy solution for pg_dumpall. -- Kevin GrittnerEDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
Andres Freund and...@2ndquadrant.com wrote: On 2013-11-27 15:14:11 -0800, Kevin Grittner wrote: ... however, I have not been able to trigger that Assert even with gdb breakpoints at what I think are the right spots. Any suggestions? How far back is it true that the above HeapTupleSatisfiesVacuum() can return HEAPTUPLE_DELETE_IN_PROGRESS but HeapTupleHeaderGetUpdateXid(tuple-t_data) on the exact same tuple structure can return InvalidTransactionId? What do you mean with how far back? What back-patching will be needed for a fix? It sounds like 9.3? Afaics you need a multixact consisting out of a) the updater and b) a lock. That's probably easiest to get if you update a row in one session without changing the primary key, and then key-share lock it in another. Or the other way round. Then abort the updater. Thanks! I'll keep trying to generate a failure at that point. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
On 2013-11-27 15:42:11 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: What do you mean with how far back? What back-patching will be needed for a fix? It sounds like 9.3? Yep. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Atri Sharma atri.j...@gmail.com writes: Can we add a function to the FDW API to define a SQL to foreign server side conversion? I think that's not tenable. Even if you limit the discussion to the postgres_fdw, some queries against past version will stop working against new version (keywords changes, catalogs, default settings, etc). I don't think you want to embed a full parser of every supported FOREIGN version of PostgreSQL inside the postgres_fdw code, so I think the text of the view needs to be an opaque string. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another bug introduced by fastpath patch
Andres Freund and...@2ndquadrant.com writes: On 2013-11-27 17:25:44 -0500, Tom Lane wrote: Or we could add a restriction to EligibleForRelationFastPath that restricts the fast-path mechanism to non-session locks, in which case we'd not need to make the zeroing contingent on allLocks either. I don't think we take any fast-path-eligible locks in session mode anyway, so this wouldn't be giving up any functionality on that end. That seems like the best thing to do to me. I poked a bit more at this and realized that what I had in mind above doesn't work: a session lock isn't a distinct lock type, just a different owner, and moreover usually we ask for a session lock on a rel that we already have locked normally. So it doesn't work to say skip trying fast path if sessionLock --- we typically won't get that far in the code. We could conceivably make it work if we were willing to forcibly promote an existing fastpath lock to regular when a session lock gets asked for, but that's complication I don't want much, especially since it would go completely untested in normal use. (I verified the claim above that we never take session locks on any fast-path-eligible lock types.) We could still do this if we were willing to actually reject requests for session level locks on fast-path-eligible lock types. At the moment that costs us nothing really. If anyone ever did have a use case, we could consider adding the extra logic to support it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Atri Sharma atri.j...@gmail.com writes: Can we add a function to the FDW API to define a SQL to foreign server side conversion? I think that's not tenable. Even if you limit the discussion to the postgres_fdw, some queries against past version will stop working against new version (keywords changes, catalogs, default settings, etc). I don't think you want to embed a full parser of every supported FOREIGN version of PostgreSQL inside the postgres_fdw code, so I think the text of the view needs to be an opaque string. I'm not real sure what this'd buy us that wouldn't be done as well or better by creating a view on the remote side. (IOW, there's nothing that says that the remote object backing a foreign table can't be a view.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum
On 2013-11-27 18:18:02 -0500, Noah Misch wrote: I think the likelihood of the problem affecting !all-visible pages is close to zero. Each vacuum will try to clean those, so they surely will get vacuumed at some point. I think the only way that could happen is if the ConditionalLockBufferForCleanup() fails in each vacuum. And that seems a bit unlikely. The page could have sat all-visible (through multiple XID epochs, let's say) until a recent UPDATE. Good point. At first, I supposed we could offer a tool to blindly freeze such tuples. However, there's no guarantee that they are in harmony with recent changes to the database; transactions that wrongly considered those tuples invisible may have made decisions incompatible with their existence. For example, reviving such a tuple could violate a UNIQUE constraint if the user had already replaced the missing row manually. Good point, although since they are all on all-visible pages sequential scans will currently already find those. It's primarily index scans that won't. So it's not really reviving them... True. Since a dump/reload of the database would already get the duplicate key violation, the revival is not making anything clearly worse. And if we hope for manual repair, many DBAs just won't do that at all. Especially if it involves compiling C code... The primary reason why I think it might be a good idea to revive automatically is, that an eventual full-table/freeze vacuum will currently delete them which seems bad. Will it? When the page became all-visible, the tuples were all hinted. They will never be considered dead. Every 2B transactions, they will alternate between live and not-yet-committed. Good point again. And pretty damn good luck. Although 9.3+ multixact rows look like they could return _DEAD since we'll do an TransactionIdDidAbort() (via GetUpdateXid()) and TransactionIdDidCommit() in there and we don't set XMAX_COMMITTED hint bits for XMAX_IS_MULTI rows. As an additional problem, once multixact.c has pruned the old multis away we'll get errors from there on. But that's less likely to affect many rows. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
On 11/26/13, 5:14 PM, Kevin Grittner wrote: I happened to build in a shell that was still set up for the clang address sanitizer, and got the attached report. On a rerun it was repeatable. XLogInsert() seems to read past the end of a variable allocated on the stack in doPickSplit(). I haven't tried to analyze it past that, since this part of the code is unfamiliar to me. I also see that. It only happens in 64-bit builds. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposed feature: Selective Foreign Keys
Hi all! The Problem - One case that traditional SQL doesn't handle very well is when you have a child entity which can be attached to a number of different parent entities. Examples might be comments, tags or file attachments - we might have 20 different entities in the system that we would like our users to be able add comments to, but the existing solutions for mapping this all have downsides. Existing solution 1: Join tables ahoy If I can have a list of comments on every other object in the system, and I want to have referrential integrity, then the obvious thing to do is create a join table between each entity and the comments table. Pros: - Straight forward, traditional object-with-collection-of-child SQL structure Cons: - If a parent object gets deleted here, we can't use foreign keys to delete e.g. a child comment, so we'll have to either explicitly do it as part of our delete logic or have a cleanup process to catch orphans. Or do a dance with delete triggers on the join tables deleting the comment. - For n entities requiring comments in the system, we need n join tables.If we want both comments and e.g. tags and likes on all of our entities, we now have 3n join tables for what should be some relatively self-contained on-the-side data - this is could be more tables than the entire rest of the system - It's difficult to create any kind of self-contained component for building applications in this scenario, as it will need to know about every other entity in the system, or be able to poke around inside whatever ORM or data access system that you have to work out what join tables it needs when running queries. Existing solution 2: Enter the matrix Instead of having n join tables, let's just mash them all together, with a column per parent object, and a check constraint to force exactly one of those columns to be set. Pros: - Less bloat in the number of tables Cons: - Doesn't solve orphan problem - Addition of a new entity which needs comments and we now have to add another column onto it, potentially rewriting the whole thing - Ugly Existing solution 3: Embed the matrix Kinda like the dependency matrix table, except that all the columns referencing potential parent objects we put into the comment table instead. Pros: - Everything contained in column table - No orphans, since cascaded deletes will now delete the actual comment Cons: - Comment table now has references to every single type that it may be attached to - Addition of a new entity and we probably have to rewrite the comment table now Existing solution 4: Abandon ye all referential integrity Have a column indicating parent type and another one for the id. In the case of comments this would be directly on the comment table itself. In the case of something like tags that we might expect to be shared between entities, it would be in a single join table. Pros: - Pretty self-contained - Data model which has neither lots of empty columns or lots of tables - Can make new entities commentable without rewriting anything - Because it's self-contained, can build application components that don't need to know much about the rest of your system. For example this is the approach that the grails taggable and commentable plugins take. Cons: - No referential integrity, since we can't have a single column pointing to different tables with existing foreign key infrastructure - Since there's no real db support for doing things this way, existing ORMs etc don't really know how use a single column to join against multiple different tables based on a discriminator or 'switch' column. Existing solution 5: Everything's a thing Make your entity hierarchy have a base level object which can have comments attached, and then everything that you need to be commentable has to extend that. You can do that in an ORM, or with table inheritance in the database. Pros: - Single top-level thing to hang your data on Cons: - You've polluted your object hierarchy just to hang some stuff off of the end of it rather than it being driven by behaviours - You're going to be paying a performance penalty - everything that extends that base level object will now join against it incessantly, and you now have a global id sequence or whatever that you may not want. Basically none of the above handle the situation very well. The cleanest is solution 4, but lack of RI sucks. Feature Proposal: Selective foreign keys. - Allow foreign keys to have where clauses. The above comment example using solution 4 might then look like then following: CREATE TABLE comment as ( id bigserial primary key, content text not null, parent_entity regclass not null, parent_id int8 ); ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) REFERENCES blog(id) WHERE (parent_entity = ‘blog'); ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Wed, Nov 27, 2013 at 1:09 AM, Peter Geoghegan p...@heroku.com wrote: This, along with the already-discussed attempted to update invisible tuple forms a full account of unexpected ERRORs seen during the extended run of the test case, so far. Actually, it was slightly misleading of me to say it's the same test-case; in fact, this time I ran each pgbench run with a variable, random number of seconds between 2 and 20 inclusive (as opposed to always 2 seconds). If you happen to need help recreating this, I am happy to give it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another bug introduced by fastpath patch
I wrote: We could still do this if we were willing to actually reject requests for session level locks on fast-path-eligible lock types. At the moment that costs us nothing really. If anyone ever did have a use case, we could consider adding the extra logic to support it. Nope, that *still* doesn't work, because in non-allLocks mode the main loop won't clear any locks that have been promoted from fastpath to regular. Sigh. For the moment I'm proposing that we just re-fetch the list header after acquiring the lock. The attached patch is slightly more verbose than that, because I took the opportunity to reformulate the while() loop as a for() loop and thereby eliminate some goto's. regards, tom lane diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 072b276..75df457 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *** LockReleaseAll(LOCKMETHODID lockmethodid *** 2098,2116 { LWLockId partitionLock = FirstLockMgrLock + partition; SHM_QUEUE *procLocks = (MyProc-myProcLocks[partition]); ! proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, ! offsetof(PROCLOCK, procLink)); ! ! if (!proclock) continue; /* needn't examine this partition */ LWLockAcquire(partitionLock, LW_EXCLUSIVE); ! while (proclock) { bool wakeupNeeded = false; - PROCLOCK *nextplock; /* Get link first, since we may unlink/delete this proclock */ nextplock = (PROCLOCK *) --- 2098,2134 { LWLockId partitionLock = FirstLockMgrLock + partition; SHM_QUEUE *procLocks = (MyProc-myProcLocks[partition]); + PROCLOCK *nextplock; ! /* ! * If the proclock list for this partition is empty, we can skip ! * acquiring the partition lock. This optimization is trickier than ! * it looks, because another backend could be in process of adding ! * something to our proclock list due to promoting one of our ! * fast-path locks. However, any such lock must be one that we ! * decided not to delete above, so it's okay to skip it again now; ! * we'd just decide not to delete it again. We must, however, be ! * careful to re-fetch the list header once we've acquired the ! * partition lock, to be sure we see the up to date version. ! * ! * XXX This argument assumes that the locallock table correctly ! * represents all of our fast-path locks. While allLocks mode ! * guarantees to clean up all of our normal locks regardless of the ! * locallock situation, we lose that guarantee for fast-path locks. ! * This is not ideal. ! */ ! if (SHMQueueNext(procLocks, procLocks, ! offsetof(PROCLOCK, procLink)) == NULL) continue; /* needn't examine this partition */ LWLockAcquire(partitionLock, LW_EXCLUSIVE); ! for (proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, ! offsetof(PROCLOCK, procLink)); ! proclock; ! proclock = nextplock) { bool wakeupNeeded = false; /* Get link first, since we may unlink/delete this proclock */ nextplock = (PROCLOCK *) *** LockReleaseAll(LOCKMETHODID lockmethodid *** 2123,2129 /* Ignore items that are not of the lockmethod to be removed */ if (LOCK_LOCKMETHOD(*lock) != lockmethodid) ! goto next_item; /* * In allLocks mode, force release of all locks even if locallock --- 2141,2147 /* Ignore items that are not of the lockmethod to be removed */ if (LOCK_LOCKMETHOD(*lock) != lockmethodid) ! continue; /* * In allLocks mode, force release of all locks even if locallock *** LockReleaseAll(LOCKMETHODID lockmethodid *** 2139,2145 * holdMask == 0 and are therefore recyclable */ if (proclock-releaseMask == 0 proclock-holdMask != 0) ! goto next_item; PROCLOCK_PRINT(LockReleaseAll, proclock); LOCK_PRINT(LockReleaseAll, lock, 0); --- 2157,2163 * holdMask == 0 and are therefore recyclable */ if (proclock-releaseMask == 0 proclock-holdMask != 0) ! continue; PROCLOCK_PRINT(LockReleaseAll, proclock); LOCK_PRINT(LockReleaseAll, lock, 0); *** LockReleaseAll(LOCKMETHODID lockmethodid *** 2168,2176 lockMethodTable, LockTagHashCode(lock-tag), wakeupNeeded); - - next_item: - proclock = nextplock; } /* loop over PROCLOCKs within this partition */ LWLockRelease(partitionLock); --- 2186,2191 *** PostPrepare_Locks(TransactionId xid) *** 3142,3160 { LWLockId partitionLock = FirstLockMgrLock + partition; SHM_QUEUE *procLocks = (MyProc-myProcLocks[partition]); ! proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, ! offsetof(PROCLOCK, procLink)); ! ! if (!proclock) continue; /* needn't examine this partition */
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Wed, Nov 27, 2013 at 03:36:12PM -0800, Kevin Grittner wrote: Of the people who posted on this thread supporting that, I think Tom said it best: Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to agree with Kevin that this behavior is wrong and should be fixed (and back-patched), so far as pg_dumpall is concerned. pg_dumpall's charter is to be able to recreate a database cluster's contents in a virgin installation, but it's failing to honor that contract if the cluster has any ALTER DATABASE SET default_read_only settings. Similarly, I think it's reasonable to try to make pg_upgrade cope with the case. I also agree with *not* changing pg_dump, since it is not the charter of pg_dump to recreate a whole cluster, and the objection about possibly restoring into a database that was meant to be protected by this setting seems to have some force. For example, I have seen dumps accidentally restored to the postgres database on multiple occasions. You might, for example, flag the postgres database with this, and thereby block such accidents. The patch as it stands would allow pg_dumpall to replicate such a cluster, flag and all. Without the patch you get many errors. It is also much easier to work around with pg_dump output. You could get a psql connection to a database, set this off for the connection, and use \i to read the pg_dump output file. Or you could concatenate a SET statement in front of the pg_dump output when piping it in. There is no correspondingly easy solution for pg_dumpall. Well, I can understand that, but part of the argument was that default_transaction_read_only is not part of the database, but rather just the transaction default: Karsten wrote: Maybe I am splitting hairs but setting transactions to readonly per default does not mean the database *as such* is to be readonly. It literally applies to the *default* state of transactions (as opposed to the ONLY state of transactions). No more, no less. I ask again! What is the logic that has us setting statement_timeout in pg_dump but default_transaction_read_only in pg_dumpall? Why can't I get an answer to that question? Is it that statement_timeout is less likely to lead to a restore failure? Are all the other settings output from pg_dump safe? Is only default_transaction_read_only a problem? Whatever the answer is, the patch should explain why we are singling out default_transaction_read_only for pg_dumpall use and everything else is in pg_dump. Why does it feel like I am going around in circles here? I feel I like am reliving the materialized view record comparison thread all over again. :-( -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Wed, Nov 27, 2013 at 03:36:12PM -0800, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to agree with Kevin that this behavior is wrong and should be fixed (and back-patched), so far as pg_dumpall is concerned. pg_dumpall's charter is to be able to recreate a database cluster's contents in a virgin installation, but it's failing to honor that contract if the cluster has any ALTER DATABASE SET default_read_only settings. Similarly, I think it's reasonable to try to make pg_upgrade cope with the case. I also agree with *not* changing pg_dump, since it is not the charter of pg_dump to recreate a whole cluster, and the objection about possibly restoring into a database that was meant to be protected by this setting seems to have some force. For example, I have seen dumps accidentally restored to the postgres database on multiple occasions. You might, for example, flag the postgres database with this, and thereby block such accidents. The patch as it stands would allow pg_dumpall to replicate such a cluster, flag and all. Without the patch you get many errors. It is also much easier to work around with pg_dump output. You could get a psql connection to a database, set this off for the connection, and use \i to read the pg_dump output file. Or you could concatenate a SET statement in front of the pg_dump output when piping it in. There is no correspondingly easy solution for pg_dumpall. OK, I have read through this again, and now I see the idea is that we are trying to have pg_dumpall override the default_transaction_read_only value, but keep it for pg_dump restores. That is a pretty tricky use-case, so I think we need to mention this in the C comments. Tom and you think we should backpatch, and it seems only I am against it, so I suppose you can move forward. We are packaging soon for a minor release. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers