Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator
2011/7/22 Jim Nasby : > On Jul 19, 2011, at 10:51 PM, Pavel Stehule wrote: >>> If you mean that such checks would be done automatically, no, they >>> shouldn't be. Consider a function that creates a table and then uses >>> it, or even just depends on using a table that doesn't yet exist when >>> you do CREATE FUNCTION. >> >> yes, any deep check is not possible for function that uses a temporary >> tables. >> >> A plpgsql_lint is not silver bullet - for these cases is necessary to >> disable lint. >> >> . I can't to speak generally - I have no idea, how much percent of >> functions are functions with access to temporary tables - in my last >> project I use 0 temp tables on cca 300 KB of plpgsql code. >> >> The more terrible problem is a new dependency between functions. I use >> a workaround - some like headers > > You can work around temp table issues the same way: just define the temp > table before you create the function. > > In practice, if I have a function that depends on a temp table it either > creates it itself if it doesn't already exist or I have a separate function > to create the table; that way you have a single place that has the temp table > definition, and that is in the database itself. there is other trick - use a persistent table with same name before. Runtime temporary table is near in search_path, so all executed SQL will be related to this temp table. Pavel > -- > Jim C. Nasby, Database Architect j...@nasby.net > 512.569.9461 (cell) http://jim.nasby.net > > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 9:19 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jul 21, 2011 at 6:43 PM, Noah Misch wrote: >>> On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: SIGetDataEntries() can pretty easily be made lock-free. The only real changes that seem to be are needed are (1) to use a 64-bit counter, so you never need to decrement > >>> On second thought, won't this be inadequate on 32-bit systems, where >>> updating >>> the 64-bit counter produces two stores? You must avoid reading it between >>> those stores. > >> Now that is a potentially big problem. > > Could we do something similar to the xxid hacks? That is, we have a lot > of counters that should be fairly close to each other, so we store only > the low-order 32 bits of each notional value, and separately maintain a > common high-order word. You probably would need some additional > overhead each time the high-order word bumps, but that's reasonably > infrequent. Well, the trouble is figuring out what the shape of that additional overhead needs to look like. I think I have a simpler idea, though: before acquiring any locks, just have SIGetDataEntries() do this: + if (stateP->nextMsgNum == segP->maxMsgNum && !stateP->resetState) + return 0; Patch (with comment explaining why I think this is OK) attached. If the message numbers happen to be equal only because the counter has wrapped, then stateP->resetState will be true, so we'll still realize we need to do some work. Test results, with the lazy vxid patch plus this patch, at 8 clients: tps = 34028.144439 (including connections establishing) tps = 34079.085935 (including connections establishing) tps = 34125.295938 (including connections establishing) And at 32 clients: tps = 185521.605364 (including connections establishing) tps = 188250.700451 (including connections establishing) tps = 186077.847215 (including connections establishing) And at 80 clients: tps = 188568.886569 (including connections establishing) tps = 191035.971512 (including connections establishing) tps = 189363.019377 (including connections establishing) Not quite as good as the unlocked version, but better than the per-backend mutex, and a whole lot simpler. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company sinval-fastpath.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libedit memory stomp is apparently fixed in OS X Lion
We've had several complaints about that tab-completion bug in the Apple-supplied version of libedit, most recently here: http://archives.postgresql.org/pgsql-bugs/2011-06/msg00119.php I had a bug filed with Apple about that, and today I got some auto-mail indicating they'd fixed that bug as of OS X 10.7 (Lion). I don't have Lion installed here, but I grabbed the libedit sources from www.opensource.apple.com and indeed it looks fixed. So, if any early adopters want to try it out ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Robert Haas writes: > On Thu, Jul 21, 2011 at 6:43 PM, Noah Misch wrote: >> On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: >>> SIGetDataEntries() can pretty easily be made lock-free. The only real >>> changes that seem to be are needed are (1) to use a 64-bit counter, so >>> you never need to decrement >> On second thought, won't this be inadequate on 32-bit systems, where updating >> the 64-bit counter produces two stores? You must avoid reading it between >> those stores. > Now that is a potentially big problem. Could we do something similar to the xxid hacks? That is, we have a lot of counters that should be fairly close to each other, so we store only the low-order 32 bits of each notional value, and separately maintain a common high-order word. You probably would need some additional overhead each time the high-order word bumps, but that's reasonably infrequent. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: bogus descriptions displayed by \d+
On Sun, Jul 17, 2011 at 10:54 AM, Josh Kupershmidt wrote: > On Sat, Jul 16, 2011 at 12:49 PM, Tom Lane wrote: >> After a bit of review of the archives, the somebody was me: >> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=b7d67954456f15762c04e5269b64adc88dcd0860 >> >> and this thread was the discussion about it: >> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01982.php >> >> It looks like we thought about pg_dump, but did not think about psql. > > Ah, interesting. I didn't even know this functionality existed. And I > think there is some documentation lacking; in the 8.4 doc page: Here's a small patch against branch 8.4 to mention support for COMMENT ON index_name.column_name. Also, a patch against master to: * get rid of the bogus "Description" outputs for \d+ sequence_name and \d+ index_name * clarify in the COMMENT ON doc page that a table _or view_ name may be used for comments on columns, rules, and triggers. If we allowed constraints on views, we could have just put in a note explaining that "table_name.column_name" applies to tables and views, but constraints are the odd man out. * slightly reordered the listing of the first bunch of Parameters on that page so that agg_name comes first, as it does in the Synopsis section I noticed that the synopsis for CREATE RULE: http://www.postgresql.org/docs/9.1/static/sql-createrule.html uses the term "table", which could be a similar omission. However, on that page the first sentence of the description specifies "table or view" so it might be fine as-is. And while I'm messing with this, some further nitpicks about psql not addressed by these patches: * The "Storage" column for \d+ sequence_name is correct, I suppose, but repetitive * The "Type" column for \dv+ view_name, \di+ index_name, \ds+ sequence_name , etc. seems borderline useless.. shouldn't you know what type you're looking at based on the backslash command you're using? Plus the table heading could be more specific than "List of relations", e.g. "List of views". Josh diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml index ab12614..58a2f02 100644 *** a/doc/src/sgml/ref/comment.sgml --- b/doc/src/sgml/ref/comment.sgml *** COMMENT ON *** 26,32 AGGREGATE agg_name (agg_type [, ...] ) | CAST (source_type AS target_type) | COLLATION object_name | ! COLUMN table_name.column_name | CONSTRAINT constraint_name ON table_name | CONVERSION object_name | DATABASE object_name | --- 26,32 AGGREGATE agg_name (agg_type [, ...] ) | CAST (source_type AS target_type) | COLLATION object_name | ! COLUMN table_or_view_name.column_name | CONSTRAINT constraint_name ON table_name | CONVERSION object_name | DATABASE object_name | *** COMMENT ON *** 42,48 OPERATOR FAMILY object_name USING index_method | [ PROCEDURAL ] LANGUAGE object_name | ROLE object_name | ! RULE rule_name ON table_name | SCHEMA object_name | SEQUENCE object_name | SERVER object_name | --- 42,48 OPERATOR FAMILY object_name USING index_method | [ PROCEDURAL ] LANGUAGE object_name | ROLE object_name | ! RULE rule_name ON table_or_view_name | SCHEMA object_name | SEQUENCE object_name | SERVER object_name | *** COMMENT ON *** 52,58 TEXT SEARCH DICTIONARY object_name | TEXT SEARCH PARSER object_name | TEXT SEARCH TEMPLATE object_name | ! TRIGGER trigger_name ON table_name | TYPE object_name | VIEW object_name } IS 'text' --- 52,58 TEXT SEARCH DICTIONARY object_name | TEXT SEARCH PARSER object_name | TEXT SEARCH TEMPLATE object_name | ! TRIGGER trigger_name ON table_or_view_name | TYPE object_name | VIEW object_name } IS 'text' *** COMMENT ON *** 97,105 object_name table_name.column_name - agg_name constraint_name function_name operator_name --- 97,106 + agg_name object_name + table_or_view_name.column_name table_name.column_name constraint_name function_name operator_name diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 682cf8a..46aae54 100644 *** a/src/bin/psql/describe.c --- b/src/bin/psql/describe.c *** describeOneTableDetails(const char *sche *** 1295,1302 appendPQExpBuffer(&buf, "\n NULL AS attcollation"); if (tableinfo.relkind == 'i') appendPQExpBuffer(&buf, ",\n pg_catalog.pg_get_indexdef(a.attrelid, a.attnum, TRUE) AS indexdef"); ! if (verbose) ! appendPQExpBuffer(&buf, ",\n a.attstorage, pg_catalog.col_description(a.attrelid, a.attnum)"); appendPQExpBuffer(&buf, "\nFROM pg_catalog.pg_attribute a"); appendPQExpBuffer(&buf, "\nWHERE a.attrelid = '%s' AND a.attnum > 0 AND NOT a.attisdropped", oid); appendPQExpBuffer(&buf, "\nORDER BY a.attnum;"); --- 1295,1309 appendPQExpBuffer(&buf, "
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 6:43 PM, Noah Misch wrote: > On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: >> Profiling this combination of patches reveals that there is still some >> pretty ugly spinlock contention on sinval's msgNumLock. And it occurs >> to me that on x86, we really don't need this lock ... or >> SInvalReadLock ... or a per-backend mutex. The whole of >> SIGetDataEntries() can pretty easily be made lock-free. The only real >> changes that seem to be are needed are (1) to use a 64-bit counter, so >> you never need to decrement > > On second thought, won't this be inadequate on 32-bit systems, where updating > the 64-bit counter produces two stores? You must avoid reading it between > those > stores. Now that is a potentially big problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 6:44 PM, Dan Ports wrote: > If you're suggesting that hardware memory barriers aren't going to be > needed to implement lock-free code on x86, that isn't true. Because a > read can be reordered with respect to a write to a different memory > location, you can still have problems. So you do still need memory > barriers, just fewer of them. > > Dekker's algorithm is the classic example: two threads each set a flag > and then check whether the other thread's flag is set. In any > sequential execution, at least one should see the other's flag set, but > on the x86 that doesn't always happen. One thread's read might be > reordered before its write. In the case of sinval, what we need to do for SIGetDataEntries() is, approximately, a bunch of loads, followed by a store to one of the locations we loaded (which no one else can have written meanwhile). So I think that's OK. In SIInsertDataEntries(), what we need to do is, approximately, take a lwlock, load from a location which can only be written while holding the lwlock, do a bunch of stores, ending with a store to that first location, and release the lwlock. I think that's OK, too. >> 2. Machines with weak memory ordering. On this category of machines >> (which includes PowerPC, Dec Alpha, and maybe some others), the CPU >> reorders memory accesses arbitrarily unless you explicitly issue >> instructions that enforce synchronization. You still need to keep the >> compiler from moving things around, too. Alpha is particularly >> pernicious, because something like a->b can fetch the pointed-to value >> before loading the pointer itself. This is otherwise known as "we >> have basically no cache coherency circuits on this chip at all". On >> these machines, you need to issue an explicit memory barrier >> instruction at each sequence point, or just acquire and release a >> spinlock. > > The Alpha is pretty much unique (thankfully!) in allowing dependent > reads to be reordered. That makes it even weaker than the typical > weak-ordering machine. Since reading a pointer and then dereferencing > it is a pretty reasonable thing to do regularly in RCU code, you > probably don't want to emit barriers in between on architectures where > it's not actually necessary. That argues for another operation that's > defined to be a barrier (mb) on the Alpha but a no-op elsewhere. > Certainly the Linux kernel found it useful to do so > (read_barrier_depends) > > Alternatively, one might question how important it is to support the > Alpha these days... Well, currently, we do, so we probably don't want to drop support for that without some careful thought. I searched the archive and found someone trying to compile 8.3.something on Alpha just a few years ago, so it's apparently not totally dead yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 6:22 PM, Florian Pflug wrote: > On Jul21, 2011, at 21:15 , Robert Haas wrote: >> On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane wrote: >>> Robert Haas writes: ... On these machines, you need to issue an explicit memory barrier instruction at each sequence point, or just acquire and release a spinlock. >>> >>> Right, and the reason that a spinlock fixes it is that we have memory >>> barrier instructions built into the spinlock code sequences on machines >>> where it matters. >>> >>> To get to the point where we could do the sort of optimization Robert >>> is talking about, someone will have to build suitable primitives for >>> all the platforms we support. In the cases where we use gcc ASM in >>> s_lock.h, it shouldn't be too hard to pull out the barrier >>> instruction(s) ... but on platforms where we rely on OS-supplied >>> functions, some research is going to be needed. >> >> Yeah, although falling back to SpinLockAcquire() and SpinLockRelease() >> on a backend-private slock_t should work anywhere that PostgreSQL >> works at all[1]. That will probably be slower than a memory fence >> instruction and certainly slower than a compiler barrier, but the >> point is that - right now - we're doing it the slow way everywhere. > > As I discovered while playing with various lockless algorithms to > improve our LWLocks, spin locks aren't actually a replacement for > a (full) barrier. > > Lock acquisition only really needs to guarantee that loads and stores > which come after the acquisition operation in program order (i.e., in > the instruction stream) aren't globally visible before that operation > completes. This kind of barrier behaviour is often fittingly called > "acquire barrier". > > Similarly, a lock release operation only needs to guarantee that loads > and stores which occur before that operation in program order are > globally visible before the release operation completes. This, again, > is fittingly called "release barrier". > > Now assume the following code fragment > > global1 = 1; > SpinLockAcquire(); > SpinLockRelease(); > global2 = 1; > > If SpinLockAcquire() has "acquire barrier" semantics, and SpinLockRelease() > has "release barrier" sematics, the it's possible for the store to global1 > to be delayed until after SpinLockAcquire(), and similarly for the store > to global2 to be executed before SpinLockRelease() completes. In other > words, what happens is > > SpinLockAcquire(); > global1 = 1; > global2 = 1; > SpinLockRelease(); > > But once that can happens, there's no reason that it couldn't also be > > SpinLockAcquire(); > global2 = 1; > global1 = 1; > SpinLockRelease(); > > I didn't check if any of our spin lock implementations is actually affected > by this, but it doesn't seem wise to rely on them being full barriers, even > if it may be true today. Hmm. I'm not worried about that. AFAIK, only IA64 has such an implementation, and our existing spinlock implementation doesn't use it. If we were to add something like that in the future, we'd presumably know that we were doing it, and would add the appropriate memory barrier primitive at the same time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Jul18, 2011, at 04:36 , Robert Haas wrote: > On Fri, Jul 8, 2011 at 6:02 AM, Florian Pflug wrote: >>> I don't want to fiddle with your git repo, but if you attach a patch >>> that applies to the master branch I'll give it a spin if I have time. >> >> Patch attached. >> >> Beware that it needs at least GCC 4.1, otherwise it'll use a per-partition >> spin lock instead of "locked xadd" to increment the shared counters. > > [ Back from vacation, catching up on email. ] > > gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) > > pgbench -n -S -T 180 -c 32 -j 32 > > with lwlock-part patch: > tps = 36974.644091 (including connections establishing) > > unpatched cd34647c666be867f95ef8fc0492c30356043f10: > tps = 39432.064293 (including connections establishing) > > And with -c 8 -j 8: > > tps = 26946.202428 (including connections establishing) > tps = 27206.507424 (including connections establishing) :-( That's disappointing, to say the least. I also completely fail to understand what the heck is going on there. I mean, you did conclusively prove that commenting out the SInval stuff made a huge difference. There's also supposed to hardly any invalidation going on during a pgbench -S run. So, since the patch removes two of the three spin-lock acquisitions from SIGetDataEntries() (so long as there are no exclusive lockers of SInvalReadLock), there should be some effect. Or so I'd think at least... If anyone has I theory, I'd love to hear it. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote: > - I'd commend capturing NOW() in a timestamptz field. That gives you: > 1. What time the DB server thought it was, in terms of UT1 > 2. What timezone it thought was tied to that connection. Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were using when we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone *was* at that time, we have no way to go back to it. Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we should store the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless without a good way to remember what timezone it started out in. Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what the database has available in it's ZIC database. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Jul21, 2011, at 03:46 , Robert Haas wrote: > Profiling this combination of patches reveals that there is still some > pretty ugly spinlock contention on sinval's msgNumLock. And it occurs > to me that on x86, we really don't need this lock ... or > SInvalReadLock ... or a per-backend mutex. The whole of > SIGetDataEntries() can pretty easily be made lock-free. The only real > changes that seem to be are needed are (1) to use a 64-bit counter, so > you never need to decrement and (2) to recheck resetState after > reading the entries from the queue, to see if we got reset while we > were reading those entries. Since x86 guarantees that writes will > become visible in the order they are executed, we only need to make > sure that the compiler doesn't rearrange things. As long as we first > read the maxMsgNum and then read the messages, we can't read garbage. > As long as we read the messages before we check resetState, we will be > sure to notice if we got reset before we read all the messages (which > is the only way that we can have read garbage messages). Sounds sensible. There're one additional hazard though - you'll also need the reads to be atomic. x86 guarantees that for up to 32 (i386) respectively 64 (x64) loads, but only for reads from properly aligned addresses (4 bytes for 4-byte reads, 8 bytes for 8-byte reads). I founds that out the hard way a few days ago, again while playing with different LWLock implementations, when I botched my test setup and the proc array entries ended up being miss-aligned. Boy, was it fun to debug the random crashes caused by non-atomic pointer reads... If we widen the counter to 64-bit, reading it atomically on x86 becomes a bit of a challenge on i386, but is doable also. From what I remember, there are two options. You can either use the 8-byte compare-and-exchange operation, but it might be that only quite recent CPUs support that. The other options seems to be to use floating-point instructions. I believe the latter is what Intel's own Thread Building Blocks library does, but I'd have to re-check to be sure. It might also be that, once you starting using floating-point instructions, you find that you actually do need fencing instructions even on x86. Dunno if the weaker ordering affects only SIMD instructions or all floating point stuff... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: > Profiling this combination of patches reveals that there is still some > pretty ugly spinlock contention on sinval's msgNumLock. And it occurs > to me that on x86, we really don't need this lock ... or > SInvalReadLock ... or a per-backend mutex. The whole of > SIGetDataEntries() can pretty easily be made lock-free. The only real > changes that seem to be are needed are (1) to use a 64-bit counter, so > you never need to decrement On second thought, won't this be inadequate on 32-bit systems, where updating the 64-bit counter produces two stores? You must avoid reading it between those stores. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 02:31:15PM -0400, Robert Haas wrote: > 1. Machines with strong memory ordering. On this category of machines > (which include x86), the CPU basically does not perform loads or > stores out of order. On some of these machines, it is apparently > possible for there to be some ordering of stores relative to loads, > but if the program stores two values or loads two values, those > operations will performed in the same order they appear in the > program. This is all correct, but... > The main thing you need to make your code work reliably on > these machines is a primitive that keeps the compiler from reordering > your code during optimization. If you're suggesting that hardware memory barriers aren't going to be needed to implement lock-free code on x86, that isn't true. Because a read can be reordered with respect to a write to a different memory location, you can still have problems. So you do still need memory barriers, just fewer of them. Dekker's algorithm is the classic example: two threads each set a flag and then check whether the other thread's flag is set. In any sequential execution, at least one should see the other's flag set, but on the x86 that doesn't always happen. One thread's read might be reordered before its write. > 2. Machines with weak memory ordering. On this category of machines > (which includes PowerPC, Dec Alpha, and maybe some others), the CPU > reorders memory accesses arbitrarily unless you explicitly issue > instructions that enforce synchronization. You still need to keep the > compiler from moving things around, too. Alpha is particularly > pernicious, because something like a->b can fetch the pointed-to value > before loading the pointer itself. This is otherwise known as "we > have basically no cache coherency circuits on this chip at all". On > these machines, you need to issue an explicit memory barrier > instruction at each sequence point, or just acquire and release a > spinlock. The Alpha is pretty much unique (thankfully!) in allowing dependent reads to be reordered. That makes it even weaker than the typical weak-ordering machine. Since reading a pointer and then dereferencing it is a pretty reasonable thing to do regularly in RCU code, you probably don't want to emit barriers in between on architectures where it's not actually necessary. That argues for another operation that's defined to be a barrier (mb) on the Alpha but a no-op elsewhere. Certainly the Linux kernel found it useful to do so (read_barrier_depends) Alternatively, one might question how important it is to support the Alpha these days... Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Thu, Jul 21, 2011 at 5:48 PM, Jim Nasby wrote: > On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote: >>> I have my doubts about that, and I hope not. These details haven't been >>> discussed at all; I only started this thread to get community approval >>> on cataloguing the TZs. >> >> I am strongly in favor of having a *timezone* data type and some system >> whereby we can uniquely identify timezones in the Zic database. That >> would be tremendously useful for all sorts of things. I'm just >> asserting that those who want a composite timestamp+saved-time-zone data >> type have not thought about all of the complications involved. > > Having to deal with timezone's completely separate from their timestamps is a > huge PITA. That said, if we had a timezone datatype there's at least the > possibility of using a composite type to deal with all of this. Or at least > we can just create a custom datatype using existing tools... the only part of > this that I see that actually requires closer core support is the timezone > data itself. > > So if the community is OK with adding a timezone datatype then we can focus > on that and leave the timestamptztz data type as an add-on (at least assuming > we don't run into any gotchas). As I have been watching this whole thread, my inclination has been to look at this from a "Prolog" perspective, where we think about the database as indicating a series of assertions about facts, from which we then try to reason. I suspect that determining what *really* needs to get recorded depends on this. And it seems to me that trying to head down the path of defining oid-based lookups of timezone names may be putting the cart before the horse. There are a number of facts about a timestamp: 1. What time did the database server think it was? SELECT NOW(); captures the database's concept of what time it was, complete with: a) The time, based, I think, on UT1. With the caveat that there's no certainty that the DB server's time is necessarily correct. b) An encoding of the timezone offset based on the value of the TimeZone GUC for this connection. If one is running an NTP daemon, pointing to a decently-connected network of NTP servers, then it's likely that this time is pretty accurate. And most of the time, I'd be inclined to treat this as authoritative, and contend that anything else is likely to be less correct and less easy to work with. The goal of this discussion thread is to record another timestamp with a different basis. It's not entirely clear what is its basis. I'll suggest one, which mostly underlines my contention that it's likely "less correct" and "less easy to work with" than having a column defined as... some_timestamp timestamp with timezone default NOW() 2. Client-based timestamp, comprising two things: a) A time, ascertained by the client. b) A timezone, ascertained by the client. Note that timezones are pretty open-ended. There is an authoritative encoding defined in the tz database, but there are other values used out there. We had to patch Slony-I to have it use 'ISO' timestamps, and recommend running in GMT/UTC, because there are values that blow things up. For instance, on AIX, there is a habit for boxes to set TZ=CUT0, out of the box, which isn't on what PostgreSQL considers to be the "official list." On the more whimsical side of things, Joey Hess, a Debian developer noted for such things as ikiwiki, etckeeper, git-annex, decided to create his very own custom timezone, "JEST", because he was irritated about DST. http://kitenet.net/~joey/blog/entry/howto_create_your_own_time_zone/ http://kitenet.net/~joey/blog/entry/JEST_results/ That "whimsical" entry won't be going into tzdata, and while we could discount this case as whimsy, it's not "out there" for organizations such as nation states to decide to legislate their own things, that we can't be certain will necessarily get into tzdata. There are enough aliases and possibilities of local national decisions to make it at least somewhat troublesome to treat this as something that can be considered fixed down to the OID level. My conclusion would be that if someone is really, really, really keen on capturing their own notion of timezone, then this fits with the notion that, if they want to have something that could be treated as remotely authoritative, they should capture a multiplicity of pieces of datestamp information, and actively accept that this will be pretty duplicative. - I'd commend capturing NOW() in a timestamptz field. That gives you: 1. What time the DB server thought it was, in terms of UT1 2. What timezone it thought was tied to that connection. - Also, I'd be inclined to capture, in plain text form: 3. A client-recorded timestamp. I'm agnostic as to whether this has *any* validation done on it; I'd think it plausible that this is simply a text field, that might require a human to interpret it. 4. A client-recorded timezone. This would be a plain text field,
Re: [HACKERS] sinval synchronization considered harmful
On Jul21, 2011, at 21:15 , Robert Haas wrote: > On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane wrote: >> Robert Haas writes: >>> ... On these machines, you need to issue an explicit memory barrier >>> instruction at each sequence point, or just acquire and release a >>> spinlock. >> >> Right, and the reason that a spinlock fixes it is that we have memory >> barrier instructions built into the spinlock code sequences on machines >> where it matters. >> >> To get to the point where we could do the sort of optimization Robert >> is talking about, someone will have to build suitable primitives for >> all the platforms we support. In the cases where we use gcc ASM in >> s_lock.h, it shouldn't be too hard to pull out the barrier >> instruction(s) ... but on platforms where we rely on OS-supplied >> functions, some research is going to be needed. > > Yeah, although falling back to SpinLockAcquire() and SpinLockRelease() > on a backend-private slock_t should work anywhere that PostgreSQL > works at all[1]. That will probably be slower than a memory fence > instruction and certainly slower than a compiler barrier, but the > point is that - right now - we're doing it the slow way everywhere. As I discovered while playing with various lockless algorithms to improve our LWLocks, spin locks aren't actually a replacement for a (full) barrier. Lock acquisition only really needs to guarantee that loads and stores which come after the acquisition operation in program order (i.e., in the instruction stream) aren't globally visible before that operation completes. This kind of barrier behaviour is often fittingly called "acquire barrier". Similarly, a lock release operation only needs to guarantee that loads and stores which occur before that operation in program order are globally visible before the release operation completes. This, again, is fittingly called "release barrier". Now assume the following code fragment global1 = 1; SpinLockAcquire(); SpinLockRelease(); global2 = 1; If SpinLockAcquire() has "acquire barrier" semantics, and SpinLockRelease() has "release barrier" sematics, the it's possible for the store to global1 to be delayed until after SpinLockAcquire(), and similarly for the store to global2 to be executed before SpinLockRelease() completes. In other words, what happens is SpinLockAcquire(); global1 = 1; global2 = 1; SpinLockRelease(); But once that can happens, there's no reason that it couldn't also be SpinLockAcquire(); global2 = 1; global1 = 1; SpinLockRelease(); I didn't check if any of our spin lock implementations is actually affected by this, but it doesn't seem wise to rely on them being full barriers, even if it may be true today. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 4:54 PM, Noah Misch wrote: > On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: >> For the last week or so, in between various other tasks, I've been >> trying to understand why performance drops off when you run pgbench -n >> -S -c $CLIENTS -j $CLIENTS -T $A_FEW_MINUTES at very high client >> counts. The answer, in a word, is SIGetDataEntries(). I believe we >> need to bite the bullet and rewrite this using a lock-free algorithm, >> using memory barriers on processors with weak memory ordering. >> Perhaps there is another way to do it, but nothing I've tried has >> really worked so far, and I've tried quite a few things. Here's the >> data. >> >> On unpatched master, performance scales pretty much linearly out to 32 >> clients. As you add more clients, it drops off: > >> [80 clients] >> tps = 132518.586371 (including connections establishing) >> tps = 130968.749747 (including connections establishing) >> tps = 132574.338942 (including connections establishing) > >> [80 clients, with lazy vxid locks and sinval-unlocked] >> tps = 203256.701227 (including connections establishing) >> tps = 190637.957571 (including connections establishing) >> tps = 190228.617178 (including connections establishing) > > Nice numbers. The sinval-unlocked.patch implementation looks like it's > taking a > sound direction. > > In > http://archives.postgresql.org/message-id/ca+tgmobbxmh_9zjudheswo6m8sbmb5hdzt+3chcluv5eztv...@mail.gmail.com, > you quoted 210k TPS when you stubbed out AcceptInvalidationMessages(). Is it > correct to conclude that AcceptInvalidationMessages() still reduces the > transaction rate by 5-10% with this stack of patches? Good question - I have not tested. One idea I just had... if we use a 64-bit counter for maxMsgNum, maybe we could make AcceptInvalidationMessages() a macro, something like this: if (MyProcState->nextMsgNum != shmInvalState->maxMsgNum) ReallyAcceptInvalidationMessages(); That ought to be extremely cheap and - if we use 64-bit counters for the message-number counters - safe. You might object that the load of maxMsgNum might migrate backward, but it can't possibly back up any further than the preceding lock acquisition, since that's required to be a full memory barrier on every architecture. And if we haven't acquired a relevant lock, then a relevant sinval message could show up an instance after we check regardless of the implementation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator
On Jul 19, 2011, at 10:51 PM, Pavel Stehule wrote: >> If you mean that such checks would be done automatically, no, they >> shouldn't be. Consider a function that creates a table and then uses >> it, or even just depends on using a table that doesn't yet exist when >> you do CREATE FUNCTION. > > yes, any deep check is not possible for function that uses a temporary tables. > > A plpgsql_lint is not silver bullet - for these cases is necessary to > disable lint. > > . I can't to speak generally - I have no idea, how much percent of > functions are functions with access to temporary tables - in my last > project I use 0 temp tables on cca 300 KB of plpgsql code. > > The more terrible problem is a new dependency between functions. I use > a workaround - some like headers You can work around temp table issues the same way: just define the temp table before you create the function. In practice, if I have a function that depends on a temp table it either creates it itself if it doesn't already exist or I have a separate function to create the table; that way you have a single place that has the temp table definition, and that is in the database itself. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
Jim Nasby writes: > On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote: >> There was an earlier point made that if someone puts eg 5pm local time >> two years in the future into the database, and then the DST boundary >> gets moved subsequently, some applications would like the value to >> still say 5pm local time, even though that means it now refers to a >> different point in absolute time - this potentially seems like a >> useful feature. Retroactive timezone changes wouldn't make a lot of >> sense in this case though... > Right; and timezone's aren't supposed to change retroactively. The ZIC > database is specifically setup so that it knows the history of TZ changes and > deals with the past correctly. You haven't noticed that at least two or three times a year, there are "historical corrections" in the ZIC database? The mapping between local time and UTC might be less likely to change for a time instant in the past than one in the future, but it would be folly to assume that it's immutable in either direction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote: >> I have my doubts about that, and I hope not. These details haven't been >> discussed at all; I only started this thread to get community approval >> on cataloguing the TZs. > > I am strongly in favor of having a *timezone* data type and some system > whereby we can uniquely identify timezones in the Zic database. That > would be tremendously useful for all sorts of things. I'm just > asserting that those who want a composite timestamp+saved-time-zone data > type have not thought about all of the complications involved. Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezone datatype there's at least the possibility of using a composite type to deal with all of this. Or at least we can just create a custom datatype using existing tools... the only part of this that I see that actually requires closer core support is the timezone data itself. So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data type as an add-on (at least assuming we don't run into any gotchas). Alvaro, please speak up if there's any technical issues here that I've missed? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 21, 2011, at 2:39 PM, Jim Nasby wrote: >> CREATE OR REPLACE FUNCTION is_timezone( >> tz CITEXT >> ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$ > ... >> CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) ); > > Storing giant globs of text with every timestamp field is really ugly. You work with what you've got. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote: > On 19 July 2011 17:11, Kevin Grittner wrote: >> Tom Lane wrote: >>> "Kevin Grittner" writes: Josh Berkus wrote: > The timestamp and the timezone in which that timestamp was > entered are two separate pieces of data and *ought* to be in two > separate fields. >>> So, if you're grabbing a timestamp and the time zone for it, how do you ensure you've done that atomically if you're at the boundary of a DST change? >>> >>> In my view of the world, the timezone that you are in is not an >>> object that changes across a DST boundary. >> >> You're right -- the moment in time should be fixed like in the >> current PostgreSQL "timestamp with time zone", and the time zone >> doesn't change with DST. Not an intentional read herring, but >> definitely some muddy thinking there. > > There was an earlier point made that if someone puts eg 5pm local time > two years in the future into the database, and then the DST boundary > gets moved subsequently, some applications would like the value to > still say 5pm local time, even though that means it now refers to a > different point in absolute time - this potentially seems like a > useful feature. Retroactive timezone changes wouldn't make a lot of > sense in this case though... Right; and timezone's aren't supposed to change retroactively. The ZIC database is specifically setup so that it knows the history of TZ changes and deals with the past correctly. > I guess there are three concepts of time here - an absolute fixed time > with no reference to a timezone, a time with a timezone that is still > set as a fixed point in time, or a local time in a specific timezone > that would move if the timezone definition changed. Or, another way to put the third class: a timestamp that remembers what it's original timezone was so that you can refer to it a common timezone (such as UTC), OR you can refer to it at it's original, local time. That's our exact need for this: we have different businesses that operate in different timezones. Generally, we only care about things in local time, but there are cases (such as event logging) where we could care about local *OR* unified time. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jul 19, 2011, at 4:20 PM, David E. Wheeler wrote: > On Jul 19, 2011, at 2:06 PM, Josh Berkus wrote: > >> I am strongly in favor of having a *timezone* data type and some system >> whereby we can uniquely identify timezones in the Zic database. > > CREATE OR REPLACE FUNCTION is_timezone( >tz CITEXT > ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$ ... > CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) ); Storing giant globs of text with every timestamp field is really ugly. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and log file output on Windows
I started thinking more about this --- we already allow multiple processes to write to a single file when running Postgres as a backend on Windows. I think it is these open() flags that make it possible: (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE), However, we are not going to be able to make all shell and command-line tools use those flags, so the best solution seems to just avoid the extra logging. I just added a C comment to pg_upgrade to document this. --- bruce wrote: > > I have fixed the bug below with the attached patches for 9.0, 9.1, and > 9.2. I did a minimal patch for 9.0 and 9.1, and a more thorough patch > for 9.2. > > The use of -l/log was tested originally in pg_migrator (for 8.4) and > reported to be working, but it turns out it only worked in 'check' mode, > and threw an error during actual upgrade. This bug was reported to me > recently by EnterpriseDB testing of PG 9.0 on Windows. The 9.2 C > comments should make it clear that Windows doesn't allow multiple > processes to write to the same file and should avoid future breakage. > > --- > > Bruce Momjian wrote: > > Has anyone successfully used pg_upgrade 9.0 with -l (log) on Windows? > > > > I received a private email bug report that pg_upgrade 9.0 does not work > > with the -l/log option on Windows. The error is: > > > > Analyzing all rows in the new cluster > > ""c:/MinGW/msys/1.0/home/edb/inst/bin/vacuumdb" --port 55445 --username > > "edb" --all --analyze > > >> c:/MinGW/msys/1.0/home/edb/auxschedule/test.log 2>&1" > > The process cannot access the file because it is being used by another > > process. > > > > What has me confused is this same code exists in pg_migrator, which was > > fixed to work with -l on Windows by Hiroshi Saito with this change: > > > > /* > > * On Win32, we can't send both server output and pg_ctl output > > * to the same file because we get the error: > > * "The process cannot access the file because it is being used by > > another process." > > * so we have to send pg_ctl output to 'nul'. > > */ > > sprintf(cmd, SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " > > "-o \"-p %d -c autovacuum=off -c > > autovacuum_freeze_max_age=20\" " > > "start >> \"%s\" 2>&1" SYSTEMQUOTE, > > bindir, ctx->logfile, datadir, port, > > #ifndef WIN32 > > ctx->logfile); > > #else > > DEVNULL); > > #endif > > > > The fix was not to use the same log file and output file for pg_ctl. > > But as you can see, the pg_ctl and vacuumdb code is unchanged: > > > > prep_status(ctx, "Analyzing all rows in the new cluster"); > > exec_prog(ctx, true, > > SYSTEMQUOTE "\"%s/vacuumdb\" --port %d --username \"%s\" " > > "--all --analyze >> %s 2>&1" SYSTEMQUOTE, > > ctx->new.bindir, ctx->new.port, ctx->user, ctx->logfile); > > > > I can't figure out of there is something odd about this user's setup or > > if there is a bug in pg_upgrade with -l on Windows. > > > > -- > > Bruce Momjian http://momjian.us > > EnterpriseDB http://enterprisedb.com > > > > + It's impossible for everything to be true. + > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + [ text/x-diff is unsupported, treating like TEXT/PLAIN ] > diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c > new file mode 100644 > index 1a515e7..37d2eed > *** a/contrib/pg_upgrade/pg_upgrade.c > --- b/contrib/pg_upgrade/pg_upgrade.c > *** prepare_new_cluster(migratorContext *ctx > *** 161,168 > prep_status(ctx, "Analyzing all rows in the new cluster"); > exec_prog(ctx, true, > SYSTEMQUOTE "\"%s/vacuumdb\" --port %d --username > \"%s\" " > ! "--all --analyze >> %s 2>&1" SYSTEMQUOTE, > ! ctx->new.bindir, ctx->new.port, ctx->user, > ctx->logfile); > check_ok(ctx); > > /* > --- 161,174 > prep_status(ctx, "Analyzing all rows in the new cluster"); > exec_prog(ctx, true, > SYSTEMQUOTE "\"%s/vacuumdb\" --port %d --username > \"%s\" " > ! "--all --analyze >> \"%s\" 2>&1" SYSTEMQUOTE, > ! ctx->new.bindir, ctx->new.port, ctx->user, > ! #ifndef WIN32 > ! ctx->logfile > ! #else > ! DEVNU
Re: [HACKERS] sinval synchronization considered harmful
On Wed, Jul 20, 2011 at 09:46:33PM -0400, Robert Haas wrote: > For the last week or so, in between various other tasks, I've been > trying to understand why performance drops off when you run pgbench -n > -S -c $CLIENTS -j $CLIENTS -T $A_FEW_MINUTES at very high client > counts. The answer, in a word, is SIGetDataEntries(). I believe we > need to bite the bullet and rewrite this using a lock-free algorithm, > using memory barriers on processors with weak memory ordering. > Perhaps there is another way to do it, but nothing I've tried has > really worked so far, and I've tried quite a few things. Here's the > data. > > On unpatched master, performance scales pretty much linearly out to 32 > clients. As you add more clients, it drops off: > [80 clients] > tps = 132518.586371 (including connections establishing) > tps = 130968.749747 (including connections establishing) > tps = 132574.338942 (including connections establishing) > [80 clients, with lazy vxid locks and sinval-unlocked] > tps = 203256.701227 (including connections establishing) > tps = 190637.957571 (including connections establishing) > tps = 190228.617178 (including connections establishing) Nice numbers. The sinval-unlocked.patch implementation looks like it's taking a sound direction. In http://archives.postgresql.org/message-id/ca+tgmobbxmh_9zjudheswo6m8sbmb5hdzt+3chcluv5eztv...@mail.gmail.com, you quoted 210k TPS when you stubbed out AcceptInvalidationMessages(). Is it correct to conclude that AcceptInvalidationMessages() still reduces the transaction rate by 5-10% with this stack of patches? -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single pass vacuum - take 1
On Thu, Jul 21, 2011 at 4:01 PM, Robert Haas wrote: > > > I think we are better off doing only equality comparisons and dodging > this problem altogether. > > Fair enough. > > Just-plain-dead line pointers would have lp_off = 0. Dead-vacuumed > line pointers would have lp_off != 0. The first vacuum would use > lp_off = 1, the next one lp_off = 2, etc. > > Actually, come to think of it, we could fit a 30-bit counter into the > line pointer. There are 15 unused bits in lp_off and 15 unused bits > in lp_len. > > Thats clever! I think we can go this path and completely avoid any special area or additional header fields. > > If we use a counter that is large enough that we don't have to worry > about wrap-around, I guess that's OK, though it seems a little weird > to think about having different backends running with different ideas > about the correct counter value. > > I think thats fine. For example, every backend runs with a different RecentXmin today and that doesn't impact any functionality. It only limits how much they can prune at any given time. The same would happen by having a stale counter. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 3:53 PM, Tom Lane wrote: > Robert Haas writes: >> I think the real challenge is going to be testing. If anyone has a >> machine with weak memory ordering they can give me access to, that >> would be really helpful for flushing the bugs out of this stuff. > > There are multi-CPU PPCen in the buildfarm, or at least there were last > time I broke the sinval code ;-). Note that testing on a single-core > PPC will prove nothing. Yeah, I was just thinking about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single pass vacuum - take 1
On Thu, Jul 21, 2011 at 12:51 PM, Pavan Deolasee wrote: > The way we force the subsequent vacuum to also look at the pages scanned and > pruned by previous failed vacuum, all the pages that have dead-vacuum line > pointers would have a new stamp once the vacuum finishes successfully and > the pg_class would have the same stamp. That seems a bit fragile. One of the things we've talked about doing is skipping pages that are pinned by some other backend. Now maybe that would be infrequent enough not to matter... but... Also, I'm not sure that's the only potential change that would break this. I think we are better off doing only equality comparisons and dodging this problem altogether. > I thought more about it and for a moment believed that we can do this with > just a bit since we rescan the pages with dead and dead-vacuum line > pointers after an aborted vacuum, but concluded that a bit or a small > counter is not good enough since other backends might be running with a > stale value and would get fooled into believing that they can collect the > dead-vacuum line pointers before the index pointers are actually removed. We > can still use a 32-bit counter though since the wrap-around for that is > practically very large for any backend to still run with such a stale > counter (you would need more than 1 billion vacuums on the same table in > between for you to hit this). I think that's a safe assumption. >> The patch as >> submitted uses lp_off to store a single bit, to distinguish between >> vacuum and dead-vacuumed, but we could actually have (for greater >> safety and debuggability) a 15-byte counter that just wraps around >> from 32,767 to 1. (Maybe it would be wise to reserve a few counter >> values, or a few bits, or both, for future projects.) That would >> eliminate the need to touch PageRepairFragmentation() or use the >> special space, since all the information would be in the line pointer >> itself. Not having to rearrange the page to reclaim dead line >> pointers is appealing, too. > > Not sure if I get you here. We need a mechanism to distinguish between dead > and dead-vacuum line pointers. How would the counter (which I assume you > mean 15-bit and not byte) help solve that ? Or are you just suggesting > replacing LSN with the counter in the page header ? Just-plain-dead line pointers would have lp_off = 0. Dead-vacuumed line pointers would have lp_off != 0. The first vacuum would use lp_off = 1, the next one lp_off = 2, etc. Actually, come to think of it, we could fit a 30-bit counter into the line pointer. There are 15 unused bits in lp_off and 15 unused bits in lp_len. >> > Is there something in place to make sure that pruning uses an up-to-date >> > relindxvacxlogid/off value? I guess it doesn't matter if it's >> > out-of-date, >> > you'll just miss the opportunity to remove some dead tuples. >> >> This seems like a tricky problem, because it could cause us to >> repeatedly fail to remove the same dead line pointers, which would be >> poor. We could do something like this: after updating pg_class, >> vacuum send an interrupt to any backend which holds RowExclusiveLock >> or higher on that relation. The interrupt handler just sets a flag. >> If that backend does heap_page_prune() and sees the flag set, it knows >> that it needs to recheck pg_class. This is a bit grotty and doesn't >> completely close the race condition (the signal might not arrive in >> time), but it ought to make it narrow enough not to matter in >> practice. > > I am not too excited about adding that complexity to the code. Even if a > backend does not have up-to-date value, it will fail to collect the > dead-vacuum pointers, but soon either it will catch up or some other backend > will remove them or the next vacuum will take care of it. If we use a counter that is large enough that we don't have to worry about wrap-around, I guess that's OK, though it seems a little weird to think about having different backends running with different ideas about the correct counter value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
2011/7/21 Robert Haas : > On Thu, Jul 21, 2011 at 3:25 PM, Yeb Havinga wrote: >> Is it possible to only include the syscache on --enable-selinux >> configurations? It would imply physical data incompatibility with standard >> configurations, but that's also true for e.g. the block size. > > Not really. SECURITY LABEL is supposedly a generic facility that can > be used by a variety of providers, and the regression tests load a > dummy provider which works on any platform to test that it hasn't > gotten broken. > >> Also, the tests I did with varying bucket sizes suggested that decreasing >> the syscache to 256 didn't show a significant performance decrease compared >> to the 2048 #buckets, for the restorecon test, which hits over 3000 objects >> with security labels. My guess is that that is a fair middle of the road >> database schema size. Are you unwilling to pay the startup overhead for a >> extra 256 syscache? > > Not sure. I'd rather not, if it's easy to rejigger things so we don't > have to. I don't think this is necessarily a hard problem to solve - > it's just that no one has tried yet. > Now, I tend to implement a cache mechanism to translate ObjectAddress to security label by sepgsql module itself, rather than generic syscache, although it requires a new hook on PrepareForTupleInvalidation() as Robert suggested in this thread. Indeed, it seems to me worthwhile not to allocate memory being unused for 90% of users; from perspective of startup performance and resource consumption. In addition, we may be potentially able to have a cache stuff well optimized to the access control of SELinux; such as cache reclaim for recently unused entries. So, I'd like to focus on the stuff in sepgsql/uavc.c right now. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Robert Haas writes: > I think the real challenge is going to be testing. If anyone has a > machine with weak memory ordering they can give me access to, that > would be really helpful for flushing the bugs out of this stuff. There are multi-CPU PPCen in the buildfarm, or at least there were last time I broke the sinval code ;-). Note that testing on a single-core PPC will prove nothing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
2011/7/21 Yeb Havinga : > >> Is it possible to only include the syscache on --enable-selinux >> configurations? It would imply physical data incompatibility with standard >> configurations, but that's also true for e.g. the block size. >> >> Also, the tests I did with varying bucket sizes suggested that decreasing >> the syscache to 256 didn't show a significant performance decrease compared >> to the 2048 #buckets, for the restorecon test, which hits over 3000 objects >> with security labels. My guess is that that is a fair middle of the road >> database schema size. Are you unwilling to pay the startup overhead for a >> extra 256 syscache? >> > > Hello KaiGai-san, > > off-list, > Unfortunatelly, not so... > I was wondering why the catalog pg_seclabel exists at all. Why not store the > labels together with the objects (pg_class, pg_attribute etc) ? The syscache > wouldn't be needed in that case. > Although current sepgsql support to assign security label on limited number of object classes (schema, relation, column and functions). However, we have planed to control accesses on whole of objects managed by PostgreSQL, not only these four. If we needed to expand system catalog everytime when an object get newly supported, the patch would be more invasive and make hard to upstream. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
Is it possible to only include the syscache on --enable-selinux configurations? It would imply physical data incompatibility with standard configurations, but that's also true for e.g. the block size. Also, the tests I did with varying bucket sizes suggested that decreasing the syscache to 256 didn't show a significant performance decrease compared to the 2048 #buckets, for the restorecon test, which hits over 3000 objects with security labels. My guess is that that is a fair middle of the road database schema size. Are you unwilling to pay the startup overhead for a extra 256 syscache? Hello KaiGai-san, off-list, I was wondering why the catalog pg_seclabel exists at all. Why not store the labels together with the objects (pg_class, pg_attribute etc) ? The syscache wouldn't be needed in that case. regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
On Thu, Jul 21, 2011 at 3:25 PM, Yeb Havinga wrote: > Is it possible to only include the syscache on --enable-selinux > configurations? It would imply physical data incompatibility with standard > configurations, but that's also true for e.g. the block size. Not really. SECURITY LABEL is supposedly a generic facility that can be used by a variety of providers, and the regression tests load a dummy provider which works on any platform to test that it hasn't gotten broken. > Also, the tests I did with varying bucket sizes suggested that decreasing > the syscache to 256 didn't show a significant performance decrease compared > to the 2048 #buckets, for the restorecon test, which hits over 3000 objects > with security labels. My guess is that that is a fair middle of the road > database schema size. Are you unwilling to pay the startup overhead for a > extra 256 syscache? Not sure. I'd rather not, if it's easy to rejigger things so we don't have to. I don't think this is necessarily a hard problem to solve - it's just that no one has tried yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 8:25 PM, Robert Haas wrote: > On Thu, Jul 21, 2011 at 3:22 PM, Dave Page wrote: >> On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas wrote: >>> I think the real challenge is going to be testing. If anyone has a >>> machine with weak memory ordering they can give me access to, that >>> would be really helpful for flushing the bugs out of this stuff. >>> Getting it to work on x86 is not the hard part. >> >> I believe there's a PPC box in our storage facility in NJ that we >> might be able to dig out for you. There's also a couple in our India >> office. Let me know if they'd be of help. > > Yes! > > More processors is better, of course, but having anything at all to > test on would be an improvement. OK, will check with India first, as it'll be easier for them to deploy. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 3:22 PM, Dave Page wrote: > On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas wrote: >> I think the real challenge is going to be testing. If anyone has a >> machine with weak memory ordering they can give me access to, that >> would be really helpful for flushing the bugs out of this stuff. >> Getting it to work on x86 is not the hard part. > > I believe there's a PPC box in our storage facility in NJ that we > might be able to dig out for you. There's also a couple in our India > office. Let me know if they'd be of help. Yes! More processors is better, of course, but having anything at all to test on would be an improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
On 2011-07-21 15:03, Robert Haas wrote: On Thu, Jul 21, 2011 at 4:00 AM, Yeb Havinga wrote: Besides that I have to admit having problems understanding why the 5MB cache for pg_seclabel is a problem; it's memory consumption is lineair only to the size of the underlying database. (in contrast with the other cache storing access vectors which would have O(n*m) space complexity if it wouldn't reclaim space). So it is proportional to the number of objects in a database and in size it seems to be in the same order as pg_proc, pg_class and pg_attribute. Fair enough. I'm not convinced that the sheer quantity of memory use is a problem, although I would like to see a few more test results before we decide that definitively. I *am* unwilling to pay the startup overhead of initializing an extra 2048 syscache that only sepgsql users will actually need. Is it possible to only include the syscache on --enable-selinux configurations? It would imply physical data incompatibility with standard configurations, but that's also true for e.g. the block size. Also, the tests I did with varying bucket sizes suggested that decreasing the syscache to 256 didn't show a significant performance decrease compared to the 2048 #buckets, for the restorecon test, which hits over 3000 objects with security labels. My guess is that that is a fair middle of the road database schema size. Are you unwilling to pay the startup overhead for a extra 256 syscache? -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 8:15 PM, Robert Haas wrote: > > I think the real challenge is going to be testing. If anyone has a > machine with weak memory ordering they can give me access to, that > would be really helpful for flushing the bugs out of this stuff. > Getting it to work on x86 is not the hard part. I believe there's a PPC box in our storage facility in NJ that we might be able to dig out for you. There's also a couple in our India office. Let me know if they'd be of help. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 2:50 PM, Tom Lane wrote: > Robert Haas writes: >> ... On these machines, you need to issue an explicit memory barrier >> instruction at each sequence point, or just acquire and release a >> spinlock. > > Right, and the reason that a spinlock fixes it is that we have memory > barrier instructions built into the spinlock code sequences on machines > where it matters. > > To get to the point where we could do the sort of optimization Robert > is talking about, someone will have to build suitable primitives for > all the platforms we support. In the cases where we use gcc ASM in > s_lock.h, it shouldn't be too hard to pull out the barrier > instruction(s) ... but on platforms where we rely on OS-supplied > functions, some research is going to be needed. Yeah, although falling back to SpinLockAcquire() and SpinLockRelease() on a backend-private slock_t should work anywhere that PostgreSQL works at all[1]. That will probably be slower than a memory fence instruction and certainly slower than a compiler barrier, but the point is that - right now - we're doing it the slow way everywhere. I think the real challenge is going to be testing. If anyone has a machine with weak memory ordering they can give me access to, that would be really helpful for flushing the bugs out of this stuff. Getting it to work on x86 is not the hard part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] This was a suggestion from Noah Misch. I wasn't quite convinced when he initially made it, but having studied the issue a lot more, I now am. The CPU doesn't know how many processes have the memory mapped into their address space. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
"Albe Laurenz" writes: > I wrote a FDW for Oracle to a) learn some server coding > and b) see how well the FDW API works for me. > I came up with three questions/experiences: > 1) GetUserMapping throws an error if there is no >user mapping for the user (or PUBLIC). >I think that it would be much more useful if >it would return NULL or something similar instead. We could make it do that, but under what circumstances would it be useful to not throw an error? It doesn't seem like you should try to establish a remote connection anyway, if there's no mapping. > 3) I am confused by the order of function calls >during execution of a subplan. It is like this: > BeginForeignScan > ReScanForeignScan > IterateForeignScan > IterateForeignScan > ... > ReScanForeignScan > IterateForeignScan > IterateForeignScan > ... > EndForeignScan > So the first ReScan is done immediately after > BeginForeignScan. Moreover, internal parameters are not > set in the BeginForeignScan call. > This is probably working as designed, but BeginForeignScan > has no way to know whether it should execute a remote > query or not. I'd say it probably shouldn't, ever. If you look at the executor's node init functions, none of them do any actual data fetching. They just prepare data structures. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Robert Haas writes: > ... On these machines, you need to issue an explicit memory barrier > instruction at each sequence point, or just acquire and release a > spinlock. Right, and the reason that a spinlock fixes it is that we have memory barrier instructions built into the spinlock code sequences on machines where it matters. To get to the point where we could do the sort of optimization Robert is talking about, someone will have to build suitable primitives for all the platforms we support. In the cases where we use gcc ASM in s_lock.h, it shouldn't be too hard to pull out the barrier instruction(s) ... but on platforms where we rely on OS-supplied functions, some research is going to be needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
On Thu, Jul 21, 2011 at 12:16 PM, Kevin Grittner wrote: > Very impressive! Those numbers definitely justify some #ifdef code > to provide alternatives for weak memory ordering machines versus > others. With the number of CPUs climbing as it is, this is very > important work! Thanks. I'm not thinking so much about #ifdef (although that could work, too) as I am about providing some primitives to allow this sort of code-writing to be done in a somewhat less ad-hoc fashion. It seems like there are basically two categories of machines we need to worry about. 1. Machines with strong memory ordering. On this category of machines (which include x86), the CPU basically does not perform loads or stores out of order. On some of these machines, it is apparently possible for there to be some ordering of stores relative to loads, but if the program stores two values or loads two values, those operations will performed in the same order they appear in the program. The main thing you need to make your code work reliably on these machines is a primitive that keeps the compiler from reordering your code during optimization. On x86, certain categories of exotic instructions do require 2. Machines with weak memory ordering. On this category of machines (which includes PowerPC, Dec Alpha, and maybe some others), the CPU reorders memory accesses arbitrarily unless you explicitly issue instructions that enforce synchronization. You still need to keep the compiler from moving things around, too. Alpha is particularly pernicious, because something like a->b can fetch the pointed-to value before loading the pointer itself. This is otherwise known as "we have basically no cache coherency circuits on this chip at all". On these machines, you need to issue an explicit memory barrier instruction at each sequence point, or just acquire and release a spinlock. So you can imagine a primitive that is defined to be a compiler barrier on machines with strong memory ordering, and as a memory fencing instruction on machines with weak memory ordering. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Policy on pulling in code from other projects?
On 07/21/2011 11:13 AM, Tom Lane wrote: "Joshua D. Drake" writes: So I am looking intently on what it is going to take to get the URI patch done for psql [1] and was digging around the web and have a URI parser library. It is under the New BSD license and is strictly RFC RFC 3986 [2] compliant . Surely we do not need a whole library to parse URIs. Also: http://uriparser.git.sourceforge.net/git/gitweb-index.cgi Sincerely, Joshua D. Drake regards, tom lane -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Policy on pulling in code from other projects?
On 07/21/2011 11:13 AM, Tom Lane wrote: "Joshua D. Drake" writes: So I am looking intently on what it is going to take to get the URI patch done for psql [1] and was digging around the web and have a URI parser library. It is under the New BSD license and is strictly RFC RFC 3986 [2] compliant . Surely we do not need a whole library to parse URIs. Shrug, standards compliant, already runs on windows Seems like a good idea to me? http://uriparser.sourceforge.net/ Sincerely, Joshua D. Drake regards, tom lane -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Policy on pulling in code from other projects?
"Joshua D. Drake" writes: > So I am looking intently on what it is going to take to get the URI > patch done for psql [1] and was digging around the web and have a URI > parser library. It is under the New BSD license and is strictly RFC RFC > 3986 [2] compliant . Surely we do not need a whole library to parse URIs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Policy on pulling in code from other projects?
On 21 July 2011 18:43, Joshua D. Drake wrote: > So I am looking intently on what it is going to take to get the URI patch > done for psql [1] and was digging around the web and have a URI parser > library. It is under the New BSD license and is strictly RFC RFC 3986 [2] > compliant . > > Now I have not dug into the code but the parser is used by other projects. > So question is: > > Assuming the code actually makes this patch easier, do we: > > A. Pull in the code into the main tree > B. Instead have it as a requirement via configure? > > 1. > http://archives.postgresql.org/message-id/1302114698.23164.17.camel@jd-desktop > > 2. http://tools.ietf.org/html/rfc3986 Without commenting on the practicalities of what you'd like to do, including code from other projects in the tree is well precedented. Off the top of my head, I can tell you that pgcrypto uses code from various sources, while preserving the original copyright notices. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Excerpts from Simon Riggs's message of jue jul 21 13:30:25 -0400 2011: > I think I'll just revert it though. Subtransactions need a lot of > tuning but this isn't high enough up my list to be worth the work. If it works and is sane, why would you revert it? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fixing PQsetvalue()
On Thu, Jul 21, 2011 at 12:19 PM, Tom Lane wrote: > Robert Haas writes: >> So I finally got around to taking a look at this patch, and I guess my >> basic feeling is that I like it. The existing code is pretty weird >> and inconsistent: the logic in PQsetvalue() basically does the same >> thing as the logic in pqAddTuple(), but incompatibly and less >> efficiently. Unifying them seems sensible, and the fix looks simple >> enough to back-patch. > > Yeah, I've been looking at it too. For some reason I had had the > idea that the proposed patch complicated the code, but actually it's > simplifying it by removing almost-duplicate code. So that's good. > > The patch as proposed adds back a bug in return for the one it fixes > (you can not free() the result of pqResultAlloc()), but that's easily > fixed. > > Will fix and commit. Cool. I believe that's the last patch for CommitFest 2011-06. *bangs gavel* I believe that makes it time for 9.2alph1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Policy on pulling in code from other projects?
Hey, So I am looking intently on what it is going to take to get the URI patch done for psql [1] and was digging around the web and have a URI parser library. It is under the New BSD license and is strictly RFC RFC 3986 [2] compliant . Now I have not dug into the code but the parser is used by other projects. So question is: Assuming the code actually makes this patch easier, do we: A. Pull in the code into the main tree B. Instead have it as a requirement via configure? 1. http://archives.postgresql.org/message-id/1302114698.23164.17.camel@jd-desktop 2. http://tools.ietf.org/html/rfc3986 Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
On Thu, Jul 21, 2011 at 5:56 PM, Tom Lane wrote: > Heikki Linnakangas writes: >> Instead of leaving the locks dangling to an already-destroyed resource >> owner, how about assigning all locks directly to the top-level resource >> owner in one sweep? That'd still be much better than the old way of >> recursively reassigning them up the subtransaction tree, one level at a >> time. > > I haven't actually read the patch, but the reason for pushing them up > only one level at a time is that if an intermediate-level subtransaction > aborts, the locks taken by its child subtransactions have to be released > at that time. It sure sounds like this patch broke that. The only path altered by the patch was the final-commit-while-in-a-subxact, so I don't see a problem in the part you mention. At commit all the locks get transferred to the parent, so we scan the the lock table repeatedly, giving O(N^2). I think I'll just revert it though. Subtransactions need a lot of tuning but this isn't high enough up my list to be worth the work. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Heikki Linnakangas writes: > Instead of leaving the locks dangling to an already-destroyed resource > owner, how about assigning all locks directly to the top-level resource > owner in one sweep? That'd still be much better than the old way of > recursively reassigning them up the subtransaction tree, one level at a > time. I haven't actually read the patch, but the reason for pushing them up only one level at a time is that if an intermediate-level subtransaction aborts, the locks taken by its child subtransactions have to be released at that time. It sure sounds like this patch broke that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single pass vacuum - take 1
On Thu, Jul 21, 2011 at 12:17 PM, Robert Haas wrote: > On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas > > I think you can sidestep that > > if you check that the page's vacuum LSN <= vacuum LSN in pg_class, > instead > > of equality. > > I don't think that works, because the point of storing the LSN in > pg_class is to verify that the vacuum completed the index cleanup > without error. The fact that a newer vacuum accomplished that goal > does not mean that all older ones did. > > The way we force the subsequent vacuum to also look at the pages scanned and pruned by previous failed vacuum, all the pages that have dead-vacuum line pointers would have a new stamp once the vacuum finishes successfully and the pg_class would have the same stamp. > > Ignoring the issue stated in previous paragraph, I think you wouldn't > > actually need an 64-bit LSN. A smaller counter is enough, as wrap-around > > doesn't matter. In fact, a single bit would be enough. After a successful > > vacuum, the counter on each heap page (with dead line pointers) is N, and > > the value in pg_class is N. There are no other values on the heap, > because > > vacuum will have cleaned them up. When you begin the next vacuum, it will > > stamp pages with N+1. So at any stage, there is only one of two values on > > any page, so a single bit is enough. (But as I said, that doesn't hold if > > vacuum skips some pages thanks to the visibility map) > > If this can be made to work, it's a very appealing idea. I thought more about it and for a moment believed that we can do this with just a bit since we rescan the pages with dead and dead-vacuum line pointers after an aborted vacuum, but concluded that a bit or a small counter is not good enough since other backends might be running with a stale value and would get fooled into believing that they can collect the dead-vacuum line pointers before the index pointers are actually removed. We can still use a 32-bit counter though since the wrap-around for that is practically very large for any backend to still run with such a stale counter (you would need more than 1 billion vacuums on the same table in between for you to hit this). > The patch as > submitted uses lp_off to store a single bit, to distinguish between > vacuum and dead-vacuumed, but we could actually have (for greater > safety and debuggability) a 15-byte counter that just wraps around > from 32,767 to 1. (Maybe it would be wise to reserve a few counter > values, or a few bits, or both, for future projects.) That would > eliminate the need to touch PageRepairFragmentation() or use the > special space, since all the information would be in the line pointer > itself. Not having to rearrange the page to reclaim dead line > pointers is appealing, too. > > Not sure if I get you here. We need a mechanism to distinguish between dead and dead-vacuum line pointers. How would the counter (which I assume you mean 15-bit and not byte) help solve that ? Or are you just suggesting replacing LSN with the counter in the page header ? > > Is there something in place to make sure that pruning uses an up-to-date > > relindxvacxlogid/off value? I guess it doesn't matter if it's > out-of-date, > > you'll just miss the opportunity to remove some dead tuples. > > This seems like a tricky problem, because it could cause us to > repeatedly fail to remove the same dead line pointers, which would be > poor. We could do something like this: after updating pg_class, > vacuum send an interrupt to any backend which holds RowExclusiveLock > or higher on that relation. The interrupt handler just sets a flag. > If that backend does heap_page_prune() and sees the flag set, it knows > that it needs to recheck pg_class. This is a bit grotty and doesn't > completely close the race condition (the signal might not arrive in > time), but it ought to make it narrow enough not to matter in > practice. > > I am not too excited about adding that complexity to the code. Even if a backend does not have up-to-date value, it will fail to collect the dead-vacuum pointers, but soon either it will catch up or some other backend will remove them or the next vacuum will take care of it. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Single pass vacuum - take 1
On Thu, Jul 21, 2011 at 11:51 AM, Robert Haas wrote: > On Tue, Jul 12, 2011 at 4:47 PM, Pavan Deolasee > wrote: > > Comments ? > > I was going to spend some time reviewing this, but I see that (1) it > has bit-rotted slightly - there is a failing hunk in pg_class.h and > (2) some of the comments downthread seem to suggest that you're > thinking about whether to revise this somewhat, in particular by using > some counter other than an LSN. Are you planning to submit an updated > version? > > Yeah, I would submit an updated version. I was just waiting to see if there are more comments about the general design. But I think I can now proceed. I wonder if we can just ignore the wrap-around issue and use a 32 bit counter. The counter can be stored in the pg_class itself since its use is limited for the given table. At the start of vacuum, we get the current value. We then increment the counter (taking care of wrap-around) and use the incremented value as a marker in the page special area. If the vacuum runs to completion, we store the new value back in the pg_class row. Since vacuums are serialized for a given table, we don't need to worry about concurrent updates to the value. While collecting dead-vacuum line pointers, either during HOT-prune or subsequent vacuum, we check if the current pg_class value and if the value is equal to the page counter, we can safely collect the dead-vacuum line pointers. For a moment, I thought we can just do away with a bit as Heikki suggested up thread, but the problem comes with the backends which might be running with stale value of the counter in the pg_class and the counter should be large enough so that it does not quickly wrap-around for all practical purposes. > A few comments on this version just reading through it: > > - In lazy_scan_heap, where you've made the call to > RecordPageWithFreeSpace() unconditional, the comment change you made > immediately above is pretty half-baked. It still refers to > lazy_vacuum_heap, which you've meanwhile removed. You need to rewrite > the whole comment, I think. > > - Instead of passing bool need_vaclsn to PageRepairFragmentation(), > how about passing Offset new_special_size? Currently, > PageRepairFragmentation() doesn't know whether it's looking at a heap > page or an index page, and it would be nice to keep it that way. It's > even possible that expanding the special space opportunistically > during page defragmentation could be useful in other contexts besides > this. Or perhaps contracting it. > > - itemid.h seems a bit schizophrenic about dead line pointers. Here, > you've decided that it's OK for lp_flags == LP_DEAD && lp_off == 1 to > mean dead-vacuumed, but there existing code says: > > #define LP_DEAD 3 /* dead, may or may > not have storage */ > > AFAICT, the actual situation here is that indexes sometimes use dead > line pointers with storage, but the heap doesn't; thus, the heap can > safely use the storage bits of dead line pointers to mean something > else, but indexes can't. I think the comments throughout itemid.h > should be adjusted to bring this out a bit more clearly, though. > > I will take care of these issues in the revised patch. Thanks for looking at the patch. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] fixing PQsetvalue()
Robert Haas writes: > So I finally got around to taking a look at this patch, and I guess my > basic feeling is that I like it. The existing code is pretty weird > and inconsistent: the logic in PQsetvalue() basically does the same > thing as the logic in pqAddTuple(), but incompatibly and less > efficiently. Unifying them seems sensible, and the fix looks simple > enough to back-patch. Yeah, I've been looking at it too. For some reason I had had the idea that the proposed patch complicated the code, but actually it's simplifying it by removing almost-duplicate code. So that's good. The patch as proposed adds back a bug in return for the one it fixes (you can not free() the result of pqResultAlloc()), but that's easily fixed. Will fix and commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single pass vacuum - take 1
On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas wrote: > How does this interact with the visibility map? If you set the visibility > map bit after vacuuming indexes, a subsequent vacuum will not visit the > page. The second vacuum will update relindxvacxlogid/off, but it will not > clean up the dead line pointers left behind by the first vacuum. Now the LSN > on the page differs from the one stored in pg_class, so subsequent pruning > will not remove the dead line pointers either. Currently, I think we would only set the visibility map bit after vacuuming the page for the second time. The patch as submitted doesn't appear to go back and set visibility map bits after finishing the index vacuum. Now, that might be nice to do, because then a hypothetical index-only scan could start taking advantage of vacuum having been done sooner. If we wanted to do that, we could restructure the visibility map to store two bits per page: one to indicate whether there is any potential work for VACUUM to do (modulo freezing) and the other to indicate whether an index pointer could possibly be aimed at a dead line pointer. (In fact, maybe we'd even want to have a third bit to indicate "all tuples frozen", which would be useful for optimizing anti-wraparound vacuum.) > I think you can sidestep that > if you check that the page's vacuum LSN <= vacuum LSN in pg_class, instead > of equality. I don't think that works, because the point of storing the LSN in pg_class is to verify that the vacuum completed the index cleanup without error. The fact that a newer vacuum accomplished that goal does not mean that all older ones did. > Ignoring the issue stated in previous paragraph, I think you wouldn't > actually need an 64-bit LSN. A smaller counter is enough, as wrap-around > doesn't matter. In fact, a single bit would be enough. After a successful > vacuum, the counter on each heap page (with dead line pointers) is N, and > the value in pg_class is N. There are no other values on the heap, because > vacuum will have cleaned them up. When you begin the next vacuum, it will > stamp pages with N+1. So at any stage, there is only one of two values on > any page, so a single bit is enough. (But as I said, that doesn't hold if > vacuum skips some pages thanks to the visibility map) If this can be made to work, it's a very appealing idea. The patch as submitted uses lp_off to store a single bit, to distinguish between vacuum and dead-vacuumed, but we could actually have (for greater safety and debuggability) a 15-byte counter that just wraps around from 32,767 to 1. (Maybe it would be wise to reserve a few counter values, or a few bits, or both, for future projects.) That would eliminate the need to touch PageRepairFragmentation() or use the special space, since all the information would be in the line pointer itself. Not having to rearrange the page to reclaim dead line pointers is appealing, too. > Is there something in place to make sure that pruning uses an up-to-date > relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date, > you'll just miss the opportunity to remove some dead tuples. This seems like a tricky problem, because it could cause us to repeatedly fail to remove the same dead line pointers, which would be poor. We could do something like this: after updating pg_class, vacuum send an interrupt to any backend which holds RowExclusiveLock or higher on that relation. The interrupt handler just sets a flag. If that backend does heap_page_prune() and sees the flag set, it knows that it needs to recheck pg_class. This is a bit grotty and doesn't completely close the race condition (the signal might not arrive in time), but it ought to make it narrow enough not to matter in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sinval synchronization considered harmful
Robert Haas wrote: > SIGetDataEntries(). I believe we need to bite the bullet and > rewrite this using a lock-free algorithm, using memory barriers on > processors with weak memory ordering. > [32 processors; 80 clients] > On unpatched master > tps = 132518.586371 (including connections establishing) > tps = 130968.749747 (including connections establishing) > tps = 132574.338942 (including connections establishing) > With the lazy vxid locks patch > tps = 119215.958372 (including connections establishing) > tps = 113056.859871 (including connections establishing) > tps = 160562.770998 (including connections establishing) > gets rid of SInvalReadLock and instead gives each backend its own > spinlock. > tps = 167392.042393 (including connections establishing) > tps = 171336.145020 (including connections establishing) > tps = 170500.529303 (including connections establishing) > SIGetDataEntries() can pretty easily be made lock-free. > tps = 203256.701227 (including connections establishing) > tps = 190637.957571 (including connections establishing) > tps = 190228.617178 (including connections establishing) > Thoughts? Comments? Ideas? Very impressive! Those numbers definitely justify some #ifdef code to provide alternatives for weak memory ordering machines versus others. With the number of CPUs climbing as it is, this is very important work! -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fixing PQsetvalue()
On Wed, Jul 20, 2011 at 10:28 PM, Robert Haas wrote: > On Mon, Jul 18, 2011 at 6:38 AM, Pavel Golub wrote: >> Hello, Merlin. >> >> I hope it's OK that I've added Andrew's patch to CommitFest: >> https://commitfest.postgresql.org/action/patch_view?id=606 >> >> I did this becuase beta3 already released, but nut nothig is done on >> this bug. > > So I finally got around to taking a look at this patch, and I guess my > basic feeling is that I like it. The existing code is pretty weird > and inconsistent: the logic in PQsetvalue() basically does the same > thing as the logic in pqAddTuple(), but incompatibly and less > efficiently. Unifying them seems sensible, and the fix looks simple > enough to back-patch. > > With respect to Tom's concern about boxing ourselves in, I guess it's > hard for me to get worried about that. I've heard no one suggest > changing the internal representation libpq uses for result sets, and > even if we did, presumably the new format would also need to support > an "append a tuple" operation - or the very worst we could cause it to > support that without much difficulty. > > So, +1 from me. right -- thanks for that. For the record, I think a rework of the libpq internal representation would be likely to happen concurrently with a rework of the API -- for example to better support streaming data. PQsetvalue very well might prove to be a headache -- just too hard to say. libpq strikes me as a 50 year plus marriage might: fractious, full of mystery and regrets, but highly functional. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] timing for 9.1beta4 / rc1
Robert Haas wrote: > http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items is > looking pretty barren at the moment. Are there things that should > be listed there? I don't know of any. The non-blocking item about paring down Dan's latest isolation test case to a size which can be committed is something I won't be able to get to for at least a couple weeks -- at least if I want to keep drawing paychecks from my employer. If someone else took that, I wouldn't complain. Meanwhile, I am running the test on my workstation regularly. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single pass vacuum - take 1
On Tue, Jul 12, 2011 at 4:47 PM, Pavan Deolasee wrote: > Comments ? I was going to spend some time reviewing this, but I see that (1) it has bit-rotted slightly - there is a failing hunk in pg_class.h and (2) some of the comments downthread seem to suggest that you're thinking about whether to revise this somewhat, in particular by using some counter other than an LSN. Are you planning to submit an updated version? A few comments on this version just reading through it: - In lazy_scan_heap, where you've made the call to RecordPageWithFreeSpace() unconditional, the comment change you made immediately above is pretty half-baked. It still refers to lazy_vacuum_heap, which you've meanwhile removed. You need to rewrite the whole comment, I think. - Instead of passing bool need_vaclsn to PageRepairFragmentation(), how about passing Offset new_special_size? Currently, PageRepairFragmentation() doesn't know whether it's looking at a heap page or an index page, and it would be nice to keep it that way. It's even possible that expanding the special space opportunistically during page defragmentation could be useful in other contexts besides this. Or perhaps contracting it. - itemid.h seems a bit schizophrenic about dead line pointers. Here, you've decided that it's OK for lp_flags == LP_DEAD && lp_off == 1 to mean dead-vacuumed, but there existing code says: #define LP_DEAD 3 /* dead, may or may not have storage */ AFAICT, the actual situation here is that indexes sometimes use dead line pointers with storage, but the heap doesn't; thus, the heap can safely use the storage bits of dead line pointers to mean something else, but indexes can't. I think the comments throughout itemid.h should be adjusted to bring this out a bit more clearly, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: No stack trace for an exception
On Thu, 2011-07-21 at 15:31 +0200, Jan Urbański wrote: > On 21/07/11 15:27, Sushant Sinha wrote: > > I am using plpythonu on postgres 9.0.2. One of my python functions was > > throwing a TypeError exception. However, I only see the exception in the > > database and not the stack trace. It becomes difficult to debug if the > > stack trace is absent in Python. > > > > logdb=# select get_words(forminput) from fi; > > ERROR: PL/Python: TypeError: an integer is required > > CONTEXT: PL/Python function "get_words" > > > > And here is the error if I run that function on the same data in python: > > > > [traceback] > > > > Is this a known problem or this needs addressing? > > Yes, traceback support in PL/Python has already been implemented and is > a new feature that will be available in PostgreSQL 9.1. > > Cheers, > Jan Thanks Jan! Just one more reason to try 9.1. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: No stack trace for an exception
On 21/07/11 15:27, Sushant Sinha wrote: > I am using plpythonu on postgres 9.0.2. One of my python functions was > throwing a TypeError exception. However, I only see the exception in the > database and not the stack trace. It becomes difficult to debug if the > stack trace is absent in Python. > > logdb=# select get_words(forminput) from fi; > ERROR: PL/Python: TypeError: an integer is required > CONTEXT: PL/Python function "get_words" > > And here is the error if I run that function on the same data in python: > > [traceback] > > Is this a known problem or this needs addressing? Yes, traceback support in PL/Python has already been implemented and is a new feature that will be available in PostgreSQL 9.1. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: No stack trace for an exception
On 21 July 2011 14:27, Sushant Sinha wrote: > I am using plpythonu on postgres 9.0.2. One of my python functions was > throwing a TypeError exception. However, I only see the exception in the > database and not the stack trace. It becomes difficult to debug if the > stack trace is absent in Python. > > logdb=# select get_words(forminput) from fi; > ERROR: PL/Python: TypeError: an integer is required > CONTEXT: PL/Python function "get_words" > > > And here is the error if I run that function on the same data in python: > > Traceback (most recent call last): > File "valid.py", line 215, in > parse_query(result['forminput']) > File "valid.py", line 132, in parse_query > dateobj = datestr_to_obj(columnHash[column]) > File "valid.py", line 37, in datestr_to_obj > dateobj = datetime.date(words[2], words[1], words[0]) > TypeError: an integer is required > > > Is this a known problem or this needs addressing? Traceback information will be added to PL/Python errors in Postgres 9.1, due out in about September. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PL/Python: No stack trace for an exception
I am using plpythonu on postgres 9.0.2. One of my python functions was throwing a TypeError exception. However, I only see the exception in the database and not the stack trace. It becomes difficult to debug if the stack trace is absent in Python. logdb=# select get_words(forminput) from fi; ERROR: PL/Python: TypeError: an integer is required CONTEXT: PL/Python function "get_words" And here is the error if I run that function on the same data in python: Traceback (most recent call last): File "valid.py", line 215, in parse_query(result['forminput']) File "valid.py", line 132, in parse_query dateobj = datestr_to_obj(columnHash[column]) File "valid.py", line 37, in datestr_to_obj dateobj = datetime.date(words[2], words[1], words[0]) TypeError: an integer is required Is this a known problem or this needs addressing? Thanks, Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
On Thu, Jul 21, 2011 at 4:00 AM, Yeb Havinga wrote: > Is there a way to dump syscache statistics like there is for > MemoryContext..Stats (something - gdb helped me there)? I don't know of one. > Besides that I have to admit having problems understanding why the 5MB cache > for pg_seclabel is a problem; it's memory consumption is lineair only to the > size of the underlying database. (in contrast with the other cache storing > access vectors which would have O(n*m) space complexity if it wouldn't > reclaim space). So it is proportional to the number of objects in a database > and in size it seems to be in the same order as pg_proc, pg_class and > pg_attribute. Fair enough. I'm not convinced that the sheer quantity of memory use is a problem, although I would like to see a few more test results before we decide that definitively. I *am* unwilling to pay the startup overhead of initializing an extra 2048 syscache that only sepgsql users will actually need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] timing for 9.1beta4 / rc1
All, 9.1beta3 was tagged on July 7th and announced July 12th. In the interest of avoiding institutional inertia, should we try to set a tentative date for a next beta or, if we don't end up fixing too many bugs in the meanwhile, perhaps a release candidate? Obviously we want to give people some time to shake out bugs in beta3, but I think it would be good to get our next 9.1 release (whatever it ends up being named) out the door sometime in August, and we'll be more likely to actually do that if we pick a date to shoot for. On a related note, http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items is looking pretty barren at the moment. Are there things that should be listed there? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Environment checks prior to regression tests?
On Thu, Jul 21, 2011 at 6:16 AM, Kohei Kaigai wrote: > How about an idea that allows to launch environment checker (typically shell > scripts) prior > to regression tests? > > The following stuffs should be preconfigured to run sepgsql's regression test. > - SELinux must be run and configured to enforcing mode. > - The sepgsql-regtest policy module must be loaded. > - The boolean of sepgsql_regression_test_mode must be turned on. > - The psql command should be labeled as 'bin_t' > > If checkinstall optionally allows to launch an environment checker on > regression test, > we may be possible to suggest users to fix up their configuration. It seems > to me quite > helpful. > > For example, one idea is to inject a dummy variable (mostly, initialized to > empty) as > dependency of installcheck, being available to overwrite in Makefile of > contrib, as follows: > > # against installed postmaster > installcheck: submake $(REGRESS_PRE) > $(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS) Seems reasonable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest Status: Sudden Death Overtime
On Thu, 2011-07-21 at 00:21 +0200, Florian Pflug wrote: > There's a small additional concern, though, which is that there's an > XPath 2.0 spec out there, and it modifies the type system and data model > rather heavily. So before we go adding functions, it'd probably be wise > to check that we're not painting ourselves into a corner. Why not just write XPATH2() function conforming to XPath 2.0 spec if the new spec is substancially different ? -- --- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Environment checks prior to regression tests?
How about an idea that allows to launch environment checker (typically shell scripts) prior to regression tests? The following stuffs should be preconfigured to run sepgsql's regression test. - SELinux must be run and configured to enforcing mode. - The sepgsql-regtest policy module must be loaded. - The boolean of sepgsql_regression_test_mode must be turned on. - The psql command should be labeled as 'bin_t' If checkinstall optionally allows to launch an environment checker on regression test, we may be possible to suggest users to fix up their configuration. It seems to me quite helpful. For example, one idea is to inject a dummy variable (mostly, initialized to empty) as dependency of installcheck, being available to overwrite in Makefile of contrib, as follows: # against installed postmaster installcheck: submake $(REGRESS_PRE) $(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS) Thanks, -- NEC Europe Ltd, SAP Global Competence Center KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache
On 2011-07-21 00:08, Robert Haas wrote: On Wed, Jul 20, 2011 at 4:48 PM, Tom Lane wrote: Kohei Kaigai writes: I'd like to have a discussion about syscache towards next commit-fest. The issues may be: - Initial bucket allocation on most cases never be referenced. - Reclaim cache entries on growing up too large. There used to be support for limiting the number of entries in a syscache. It got removed (cf commit 8b9bc234ad43dfa788bde40ebf12e94f16556b7f) because (1) it was remarkably expensive to do it (extra list manipulations, etc), and (2) performance tended to fall off a cliff as soon as you had a few more tables or whatever than the caches would hold. I'm disinclined to reverse that decision. It appears to me that the security label stuff needs a different set of performance tradeoffs than the rest of the catalogs, which means it probably ought to do its own caching, rather than trying to talk us into pessimizing the other catalogs for seclabel's benefit. I agree that we don't want to limit the size of the catcaches. We've been careful to design them in such a way that they won't blow out memory, and so far there's no evidence that they do. If it ain't broke, don't fix it. Having catcaches that can grow in size as needed sounds useful to me, though. Is there a way to dump syscache statistics like there is for MemoryContext..Stats (something - gdb helped me there)? Besides that I have to admit having problems understanding why the 5MB cache for pg_seclabel is a problem; it's memory consumption is lineair only to the size of the underlying database. (in contrast with the other cache storing access vectors which would have O(n*m) space complexity if it wouldn't reclaim space). So it is proportional to the number of objects in a database and in size it seems to be in the same order as pg_proc, pg_class and pg_attribute. regards, -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper
Heikki Linnakangas wrote: >> 2) If I decide to close remote database connections after >> use, I would like to do so where reasonable. >> I would like to keep the connection open between query >> planning and query execution and close it when the >> scan is done. >> The exception could be named prepared statements. >> Is there a way to tell if that is the case during >> planing or execution? > Hmm, maybe you could add a hook to close the connection when the > transaction ends. But actually, you'd want to keep the connection > open across transactions too. Some sort of a general connection > caching facility would be useful for many FDW. I agree, and that is how I implemented it at the moment. But it might be nice to give the user the option, say, if they know that it is a long session in a daemon process that accesses the remote table only once a day. I'll look into the hook option. Here are some more ideas for FDW API functions/macros that might be useful for FDW developers. - A function that gives you the internal and external parameters at execution time. - A function that gives you a type's input and output function. - A function that gives you the OID of the foreign table owner. - A function that gives you the list of columns of the foreign table (atttypid, atttypmod, attname, maybe others). Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers