Re: [HACKERS] BGWriter latch, power saving
On Wed, Jan 4, 2012 at 7:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Setting a latch that's already set is fast, but surely it's even faster to not even try. Agreed. I think we should SetLatch() at the first point a backend writes a dirty buffer because the bgwriter has been inactive. Continually waking the bgwriter makes it harder for it to return to sleep. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Tue, Jan 3, 2012 at 11:00 PM, Jim Nasby j...@nasby.net wrote: On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote: (2) I'm not sure about doing this in three parts, to skip the checksum itself and the hole in the middle of the page. Is this because the hole might not have predictable data? Why would that matter, as long as it is read back the same? IMO not checksumming the free space would be a really bad idea. It's entirely possible to have your hardware crapping on your free space, and I'd still want to know that that was happening. Now, it would be interesting if the free space could be checksummed separately, since there's no reason to refuse to read the page if only the free space is screwed up... But given the choice, I'd rather get an error when the free space is corrupted and be forced to look into things rather than blissfully ignore corrupted free space only to be hit later with real data loss. I see that argument. We don't have space for 2 checksums. We can either (1) report all errors on a page, including errors that don't change PostgreSQL data. This involves checksumming long strings of zeroes, which the checksum algorithm can't tell apart from long strings of ones. (2) report only errors that changed PostgreSQL data. We already do (1) for WAL CRCs so doing the same thing for page checksums makes sense and is much faster. If enough people think we should do (2) then its a simple change to the patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Tue, Jan 3, 2012 at 10:21 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm happy with how this looks, except (as noted in a code comment) that there seems to be room for optimization of the calculation itself. Details below: ... (3) Rather than having PageSetVerificationInfo() use memcpy, followed by pass through the copied data to calculate the checksum, might it be better to have a copy and calculate version of the function (as in VMware's original patch) to save an extra pass over the page image? Other than these performance tweaks around the calculation phase, I didn't spot any problems. I beat up on it a bit on a couple machines without hitting any bugs or seeing any regression test failures. My focus was on getting something working first, then tuning. If we're agreed that we have everything apart from the tuning then we can proceed with tests to see which works better. The copy and calculate approach might get in the way of hardware prefetch since in my understanding the memory fetch time exceeds the calculation time. As discussed elsewhere using that code or not would not stop that work being credited. David, please can you rework the VMware calc patch to produce an additional 16-bit checksum mechanism in a way compatible with the 16bit patch, so we can test the two versions of the calculation? We can make the GUC an enum so that the page checksum is selectable (for testing). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 12:47 AM, David E. Wheeler wrote: Is this perhaps by design? Oy, this doesn’t look good: $ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$; ERROR: server conn crashed? ERROR: server conn crashed? The connection to the server was lost. Attempting reset: Succeeded. (pgxn@localhost:5900) 06:44:42 [pgxn] $ Try elog(NOTICE, $^V) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote: (1) I like the choice of Fletcher-16. It should be very good at detecting problems while being a lot less expensive that an official CRC calculation. I wonder if CRC32c wouldn't be a good alternative given more and more cpus (its in SSE 4.2) support calculating it in silicon. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund and...@anarazel.de wrote: On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote: (1) I like the choice of Fletcher-16. It should be very good at detecting problems while being a lot less expensive that an official CRC calculation. I wonder if CRC32c wouldn't be a good alternative given more and more cpus (its in SSE 4.2) support calculating it in silicon. We're trying to get something that fits in 16bits for this release. I'm guessing CRC32c doesn't? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jim Nasby j...@nasby.net writes: On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote: This could well be related to the fact that DropRelFileNodeBuffers() does a scan of shared_buffers, which is an O(N) approach no matter the size of the index. Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out. No, we can't, because if they're still dirty then the bgwriter would first try to write them to the no-longer-existing storage file. It's important that we kill the buffers immediately during relation drop. I'm still thinking that it might be sufficient to mark the buffers invalid and let the clock sweep find them, thereby eliminating the need for a freelist. My patch puts things on the freelist only when it is free to do so. Not having a freelist at all is probably a simpler way of avoiding the lock contention, so I'll happily back that suggestion instead. Patch attached, previous patch revoked. Simon is after a different solution involving getting rid of the clock sweep... err, No, he isn't. Not sure where that came from since I'm advocating only minor changes there to curb worst case behaviour. But lets discuss that on the main freelist thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 94cefba..9332a74 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -115,7 +115,7 @@ InitBufferPool(void) * Initially link all the buffers together as unused. Subsequent * management of this list is done by freelist.c. */ - buf-freeNext = i + 1; + StrategyInitFreelistBuffer(buf); buf-io_in_progress_lock = LWLockAssign(); buf-content_lock = LWLockAssign(); diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 3e62448..6b49cae 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -27,6 +27,7 @@ typedef struct /* Clock sweep hand: index of next buffer to consider grabbing */ int nextVictimBuffer; +#ifdef USE_BUFMGR_FREELIST int firstFreeBuffer; /* Head of list of unused buffers */ int lastFreeBuffer; /* Tail of list of unused buffers */ @@ -34,7 +35,7 @@ typedef struct * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is, * when the list is empty) */ - +#endif /* * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. @@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) */ StrategyControl-numBufferAllocs++; +#ifdef USE_BUFMGR_FREELIST /* * Try to get a buffer from the freelist. Note that the freeNext fields * are considered to be protected by the BufFreelistLock not the @@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } UnlockBufHdr(buf); } +#endif - /* Nothing on the freelist, so run the clock sweep algorithm */ + /* Run the clock sweep algorithm */ trycounter = NBuffers; for (;;) { @@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) * If the buffer is pinned or has a nonzero usage_count, we cannot use * it; decrement the usage_count (unless pinned) and keep scanning. */ - LockBufHdr(buf); if (buf-refcount == 0) { - if (buf-usage_count 0) + if (buf-usage_count StrategyControl-completePasses) { buf-usage_count--; trycounter = NBuffers; } else { -/* Found a usable buffer */ -if (strategy != NULL) - AddBufferToRing(strategy, buf); -return buf; +LockBufHdr(buf); +if (buf-refcount == 0) +{ + UnlockBufHdr(buf); + /* Found a usable buffer */ + if (strategy != NULL) + AddBufferToRing(strategy, buf); + return buf; +} +UnlockBufHdr(buf); } } else if (--trycounter == 0) @@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) * probably better to fail than to risk getting stuck in an * infinite loop. */ - UnlockBufHdr(buf); elog(ERROR, no unpinned buffers available); } - UnlockBufHdr(buf); } /* not reached */ @@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) void StrategyFreeBuffer(volatile BufferDesc *buf) { +#ifdef USE_BUFMGR_FREELIST LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); /* @@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf) } LWLockRelease(BufFreelistLock); +#endif +} + +/* + * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool + */ +void
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 3:49 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes jeff.ja...@gmail.com wrote: On 12/29/11, Ants Aasma ants.aa...@eesti.ee wrote: Unless I'm missing something, double-writes are needed for all writes, not only the first page after a checkpoint. Consider this sequence of events: 1. Checkpoint 2. Double-write of page A (DW buffer write, sync, heap write) 3. Sync of heap, releasing DW buffer for new writes. ... some time goes by 4. Regular write of page A 5. OS writes one part of page A 6. Crash! Now recovery comes along, page A is broken in the heap with no double-write buffer backup nor anything to recover it by in the WAL. Isn't 3 the very definition of a checkpoint, meaning that 4 is not really a regular write as it is the first one after a checkpoint? I think you nailed it. No, I should have explicitly stated that no checkpoint happens in between. I think the confusion here is because I assumed Kevin described a fixed size d-w buffer in this message: On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... The file is fsync'd (like I said, hopefully to BBU cache), then each page in the double-write buffer is written to the normal page location, and that is fsync'd. Once that is done, the database writes have no risk of being torn, and the double-write buffer is marked as empty. ... If the double-write buffer survives until the next checkpoint, double-writing only the first write should work just fine. The advantage over current full-page writes is that the write is not into the WAL stream and is done (hopefully) by the bgwriter/checkpointer in the background. -- Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
2012/1/4 Simon Riggs si...@2ndquadrant.com: On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund and...@anarazel.de wrote: I wonder if CRC32c wouldn't be a good alternative given more and more cpus (its in SSE 4.2) support calculating it in silicon. We're trying to get something that fits in 16bits for this release. I'm guessing CRC32c doesn't? What happens to the problem-detecting performance of a 16 bit part of a CRC32c vs. a real 16 bit checksum? If it is still as good, it might make sense to use the former, assuming that there is a way to easily trigger the silicon support and enough CPUs support it. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
I have installed RAMdisk and pointed the parameter: #stats_temp_directory = 'B:\pg_stat_tmp' I also tried #stats_temp_directory = 'B:/pg_stat_tmp' But, still there is no file created in the RAM disk. The previous stat file is touched even after the change is made. (I have restarted the service after effecting the change) -- View this message in context: http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5119436.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On 4 Leden 2012, 13:17, pratikchirania wrote: I have installed RAMdisk and pointed the parameter: #stats_temp_directory = 'B:\pg_stat_tmp' I also tried #stats_temp_directory = 'B:/pg_stat_tmp' But, still there is no file created in the RAM disk. The previous stat file is touched even after the change is made. (I have restarted the service after effecting the change) You have to remove the '#' at the beginning, this way it's commented out. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Simon Riggs wrote: We can either (1) report all errors on a page, including errors that don't change PostgreSQL data. This involves checksumming long strings of zeroes, which the checksum algorithm can't tell apart from long strings of ones. What do you mean? Each byte which goes into the checksum, and the position of that byte influences the outcome once you've got a non-zero value in sum1. The number of leading NIL bytes would not affect the outcome unless you seed the calculation with something non-zero, but including the page header in the calculation seems to cover that OK. (2) report only errors that changed PostgreSQL data. We already do (1) for WAL CRCs so doing the same thing for page checksums makes sense and is much faster. If enough people think we should do (2) then its a simple change to the patch. To me, (1) makes more sense, but it seems to me you're currently doing (2) because you check in three parts, skipping the free space in the middle of the page. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Excerpts from Kevin Grittner's message of mié ene 04 04:12:43 -0300 2012: Kevin Grittner wrote: if we define sum1 and sum2 as uint I don't see how we can get an overflow with 8k byes I feel the need to amend that opinion. While sum1 only needs to hold a maximum of (BLCKSZ * 255), which would be adequate for a BLCKSZ up to 16 MB, sum2 needs to hold up to a maximum of ((BLCKSZ * (BLCKSZ + 1) / 2) * 255) so a 32-bit unsigned isn't even good for an 8 KB block size. A uint64 for sum2 can handle BLCKSZ up to 256 MB. So we could define sum1 as uint (which is presumably either uint32 or uint64) and sum2 as uint64 and for a BLCKSZ up toe 16 MB we still don't need to find the modulo 255 numbers until we're done adding things up. Does anyone think we need to support anything bigger than that? Making them both uint64 would support a BLCKSZ up to 256 MB without needing to do the division inside the loop. but it might be slightly slower on 32-bit builds. We don't support BLCKSZ higher than 32k anyway ... saith pg_config.h: /* Size of a disk block --- this also limits the size of a tuple. You can set it bigger if you need bigger tuples (although TOAST should reduce the need to have large tuples, since fields can be spread across multiple tuples). BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ is currently 2^15 (32768). This is determined by the 15-bit widths of the lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h). Changing BLCKSZ requires an initdb. */ #define BLCKSZ 8192 -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Robert Haas wrote: Jeff Janes wrote: But it doesn't seem safe to me replace a page from the DW buffer and then apply WAL to that replaced page which preceded the age of the page in the buffer. That's what LSNs are for. Agreed. If we write the page to the checkpoint buffer just once per checkpoint, recovery can restore the double-written versions of the pages and then begin WAL replay, which will restore all the subsequent changes made to the page. Recovery may also need to do additional double-writes if it encounters pages that for which we wrote WAL but never flushed the buffer, because a crash during recovery can also create torn pages. That's a good point. I think WAL application does need to use double-write. As usual, it doesn't affect *when* a page must be written, but *how*. When we reach a restartpoint, we fsync everything down to disk and then nuke the double-write buffer. I think we add to the double-write buffer as we write pages from the buffer to disk. I don't think it makes sense to do potentially repeated writes of the same page with different contents to the double-write buffer as we go; nor is it a good idea to leave the page unsynced and let the double-write buffer grow for a long time. Similarly, in normal running, we can nuke the double-write buffer at checkpoint time, once the fsyncs are complete. Well, we should nuke it for re-use as soon as all pages in the buffer are written and fsynced. I'm not at all sure that the best performance is hit by waiting for checkpoint for that versus doing it at page eviction time. The whole reason that double-write techniques don't double the write time is that it is relatively small and the multiple writes to the same disk sectors get absorbed by the BBU write-back without actually hitting the disk all the time. Letting the double-write buffer grow to a large size seems likely to me to be a performance killer. The whole double-write, including fsyncs to buffer and the actual page location should just be considered part of the page write process, I think. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Simon, all, * Simon Riggs (si...@2ndquadrant.com) wrote: (1) report all errors on a page, including errors that don't change PostgreSQL data. This involves checksumming long strings of zeroes, which the checksum algorithm can't tell apart from long strings of ones. Do we actually know when/where it's supposed to be all zeros, and hence could we check for that explicitly? If we know what it's supposed to be, in order to be consistent with other information, I could certainly see value in actually checking that. I don't think that's valuable enough to go breaking abstraction layers or bending over backwards to do it though. If we don't have the knowledge, at the right level, that the data should all be zeros then including those pieces in the CRC certainly makes sense to me. Just my 2c. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 16-bit page checksums for 9.2
Simon Riggs wrote: My focus was on getting something working first, then tuning. If we're agreed that we have everything apart from the tuning then we can proceed with tests to see which works better. Sure. I just think you are there already except for what I got into. FWIW, moving the modulus application out of the loop is a very trivial change and has no affect on the results; it's strictly a performance issue. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Alvaro Herrera wrote: We don't support BLCKSZ higher than 32k anyway Thanks for pointing that out. Then I think we should declare sum1 to be uint and sum2 to be uint64. We can take out the % 255 out from where it sits in the v2 patch, and just add something like this after the sums are generated: sum1 %= 255; sum2 %= 255; Or, of course, we can just do it on the line where we combine the two sums for the final result, if that's not too hard to read. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Pavel Stehule wrote: here is new version of CHECK FUNCTION patch I changed implementation of interface: * checked functions returns table instead raising exceptions - it necessary for describing more issues inside one function - and it allow to use better structured data then ExceptionData [...] * result of CHECK FUNCTION is simple table (like EXPLAIN - based on Tom proposition) I don't have the time for a complete review, but I tried the patch and found: It is in context diff and applies to current master (there is fuzz 1 in one hunk). It contains documentation and regression tests. Compiles without warnings and passes regression tests. The two or three CHECK FUNCTIONs I ran worked ok. The documentation (that I wrote) will need to get updated: currently it states in two places that the checker function should throw a warning if it encounters a problem. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Tue, Jan 3, 2012 at 9:22 PM, Robert Haas robertmh...@gmail.com wrote: It seems to me that if you are happy with #2, you don't really need to enable sync rep in the first place. At any rate, even without multiple component failures, this configuration makes it pretty easy to lose durability (which is the only point of having sync rep in the first place). Suppose the NIC card on the master is the failing component. If it happens to drop the TCP connection to the clients just before it drops the connection to the standby, the standby will have all the transactions, and you can fail over just fine. If it happens to drop the TCP connection to the just before it drops the connection to the clients, the standby will not have all the transactions, and failover will lose some transactions - and presumably you enabled this feature in the first place precisely to prevent that sort of occurrence. I do think that it might be useful to have this if there's a configurable timeout involved - that way, people could say, well, I'm OK with maybe losing transactions if the standby has been gone for X seconds. But if the only possible behavior is equivalent to a zero-second timeout I don't think it's too useful. It's basically just going to lead people to believe that their data is more secure than it really is, which IMHO is not helpful. So, I'm a big fan of syncrep guaranteeing it's guarantees. To me, that's the whole point. Having it fall out of sync rep at any point *automatically* seems to be exactly counter to the point of sync rep. That said, I'm also a big fan of monitoring everything as well as I could... I'ld love a hook script that was run if sync-rep state ever changed (heck, I'ld even like it if it just choose a new sync standby). Even better, is there a way we could start injecting notify events into the cluster on these types of changes? Especially now that notify events can take payloads, it means I don't have to keep constantly polling the database to see if it things its connected, etc. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWriter latch, power saving
On 4 January 2012 07:24, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think SetBufferCommitInfoNeedsSave() needs the same treatment as MarkBufferDirty(). And it would probably be good to only set the latch if the buffer wasn't dirty already. Setting a latch that's already set is fast, but surely it's even faster to not even try. That seems reasonable. Revised patch is attached. Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit worried about the impact on systems with a lot of CPUs. If you have a lot of CPUs writing to the same cache line that contains the latch's flag, that might get expensive. Also reasonable, but I don't think that I'll get around to it until after the final commitfest deadline. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c new file mode 100644 index e3ae92d..0fcaf01 *** a/src/backend/bootstrap/bootstrap.c --- b/src/backend/bootstrap/bootstrap.c *** AuxiliaryProcessMain(int argc, char *arg *** 416,421 --- 416,422 case BgWriterProcess: /* don't set signals, bgwriter has its own agenda */ + ProcGlobal-bgwriterLatch = MyProc-procLatch; /* Expose */ BackgroundWriterMain(); proc_exit(1); /* should never return */ diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c new file mode 100644 index fc1a579..a5248ba *** a/src/backend/port/unix_latch.c --- b/src/backend/port/unix_latch.c *** *** 50,55 --- 50,56 #include miscadmin.h #include postmaster/postmaster.h #include storage/latch.h + #include storage/proc.h #include storage/shmem.h /* Are we currently in WaitLatch? The signal handler would like to know. */ diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c new file mode 100644 index 1f8d2d6..59e5180 *** a/src/backend/postmaster/bgwriter.c --- b/src/backend/postmaster/bgwriter.c *** *** 51,56 --- 51,58 #include storage/ipc.h #include storage/lwlock.h #include storage/pmsignal.h + #include storage/proc.h + #include storage/procsignal.h #include storage/shmem.h #include storage/smgr.h #include storage/spin.h *** static volatile sig_atomic_t shutdown_re *** 73,78 --- 75,82 /* * Private state */ + #define BGWRITER_HIBERNATE_MS 1 + static bool am_bg_writer = false; /* Prototypes for private functions */ *** BackgroundWriterMain(void) *** 123,129 pqsignal(SIGQUIT, bg_quickdie); /* hard crash time */ pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); ! pqsignal(SIGUSR1, SIG_IGN); /* reserve for ProcSignal */ pqsignal(SIGUSR2, SIG_IGN); /* --- 127,133 pqsignal(SIGQUIT, bg_quickdie); /* hard crash time */ pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); ! pqsignal(SIGUSR1, procsignal_sigusr1_handler); pqsignal(SIGUSR2, SIG_IGN); /* *** BackgroundWriterMain(void) *** 238,278 for (;;) { /* ! * Emergency bailout if postmaster has died. This is to avoid the ! * necessity for manual cleanup of all postmaster children. */ ! if (!PostmasterIsAlive()) ! exit(1); ! ! if (got_SIGHUP) { ! got_SIGHUP = false; ! ProcessConfigFile(PGC_SIGHUP); ! /* update global shmem state for sync rep */ } ! if (shutdown_requested) { /* ! * From here on, elog(ERROR) should end with exit(1), not send ! * control back to the sigsetjmp block above */ ! ExitOnAnyError = true; ! /* Normal exit from the bgwriter is here */ ! proc_exit(0); /* done */ ! } ! /* ! * Do one cycle of dirty-buffer writing. ! */ ! BgBufferSync(); ! /* Nap for the configured time. */ ! BgWriterNap(); } } /* * BgWriterNap -- Nap for the configured time or until a signal is received. */ static void BgWriterNap(void) --- 242,337 for (;;) { /* ! * Do one cycle of dirty-buffer writing, potentially hibernating if ! * there have been no buffers to write. */ ! if (!BgBufferSync()) { ! /* Clock sweep was not lapped - nap for the configured time. */ ! BgWriterNap(); } ! else { + int res = 0; /* ! * Initiate hibernation by arming the process latch. This usage ! * differs from the standard pattern for latches outlined in ! * latch.h, where the latch is generally reset immediately after ! * WaitLatch returns, in the next iteration of an infinite loop. ! * ! * It should only be possible to *really* set the latch from client ! * backends while the bgwriter is idle, and not during productive ! * cycles where buffers are written, or shortly thereafter. It's ! * important that the SetLatch() call within the buffer manager ! * usually inexpensively returns
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost sfr...@snowman.net wrote: Simon, all, * Simon Riggs (si...@2ndquadrant.com) wrote: (1) report all errors on a page, including errors that don't change PostgreSQL data. This involves checksumming long strings of zeroes, which the checksum algorithm can't tell apart from long strings of ones. Do we actually know when/where it's supposed to be all zeros, and hence could we check for that explicitly? If we know what it's supposed to be, in order to be consistent with other information, I could certainly see value in actually checking that. Yes, we can. Excellent suggestion, will implement. That means we can keep the CRC calc fast as well as check the whole of the page inbound. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Regarding Checkpoint Redo Record
Why PostgreSQL needs to write WAL record for Checkpoint when it maintains same information in pg_control file? This may be required in case we need information about more than one checkpoint as pg_control can hold information of only recent checkpoint. But I could not think of a case where more than one checkpoint information will be required. Could anybody let me know the cases where it is required.
Re: [HACKERS] Regarding Checkpoint Redo Record
On 04.01.2012 08:42, Amit Kapila wrote: Why PostgreSQL needs to write WAL record for Checkpoint when it maintains same information in pg_control file? I guess it wouldn't be strictly necessary... This may be required in case we need information about more than one checkpoint as pg_control can hold information of only recent checkpoint. But I could not think of a case where more than one checkpoint information will be required. There is a pointer in the control file to the previous checkpoint, too. It's not normally needed, but we fall back to that if the latest checkpoint cannot be read for some reason, like disk failure. If you have a disk failure and cannot read the latest checkpoint, chances are that you have a corrupt database anyway, but at least we try to recover as much as we can using the previous checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding Checkpoint Redo Record
On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 04.01.2012 08:42, Amit Kapila wrote: Why PostgreSQL needs to write WAL record for Checkpoint when it maintains same information in pg_control file? I guess it wouldn't be strictly necessary... Apart from replicated standbys, which need that info for running restartpoints. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 12:47 AM, David E. Wheeler wrote: Oy, this doesnt look good: $ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$; The connection to the server was lost. Attempting reset: Succeeded. Try elog(NOTICE, $^V) Isn't this a Perl bug? It seems to be crashing in SvPVutf8, which means that either Perl passed something that's not an SV to a function declared to accept SVs, or that SvPVutf8 fails on some SVs. Either way, Perl is failing to satisfy the POLA if you ask me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 4, 2012, at 12:44 AM, Andrew Dunstan wrote: Try elog(NOTICE, $^V) Yeah, I used elog(NOTICE, version-new($^V)) Which was fine. But still, it should’t segfault. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 4, 2012, at 8:15 AM, Tom Lane wrote: Isn't this a Perl bug? It seems to be crashing in SvPVutf8, which means that either Perl passed something that's not an SV to a function declared to accept SVs, or that SvPVutf8 fails on some SVs. Either way, Perl is failing to satisfy the POLA if you ask me. Possibly, though I know nothing of Perl’s internals. Someone able to come up with a simple test case? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 11:15 AM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 01/04/2012 12:47 AM, David E. Wheeler wrote: Oy, this doesn’t look good: $ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$; The connection to the server was lost. Attempting reset: Succeeded. Try elog(NOTICE, $^V) Isn't this a Perl bug? It seems to be crashing in SvPVutf8, which means that either Perl passed something that's not an SV to a function declared to accept SVs, or that SvPVutf8 fails on some SVs. Either way, Perl is failing to satisfy the POLA if you ask me. U, not sure. The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a scalar) but some other sort of animal: $^V The revision, version, and subversion of the Perl interpreter, represented as a version object. This variable first appeared in perl 5.6.0; earlier versions of perl will see an undefined value. Before perl 5.10.0 $^V was represented as a v-string. $^V can be used to determine whether the Perl interpreter executing a script is in the right range of versions. For example: warn Hashes not randomized!\n if !$^V or $^V lt v5.8.1 To convert $^V into its string representation use sprintf()'s %vd conversion: printf version is v%vd\n, $^V; # Perl's version But Util.xs::util_elog() expects an SV and doesn't check whether or not it actually has one. I've found a few other ways of crashing this call (e.g. by passing a typeglob), so maybe we need to test that we actually have an SV. I think SvOK() is what we'd use for that - perl gurus please confirm. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a scalar) but some other sort of animal: Yeah, it's a version object, but I'd have thought that SvPV and friends would automatically stringify such an object. Otherwise, practically any kind of perl extension could be crashed by passing it one, no? But Util.xs::util_elog() expects an SV and doesn't check whether or not it actually has one. I've found a few other ways of crashing this call (e.g. by passing a typeglob), so maybe we need to test that we actually have an SV. I think SvOK() is what we'd use for that - perl gurus please confirm. I looked at that last night but it appeared that SvOK would be perfectly happy. (Didn't actually try it, though, I was just eyeballing the flags in gdb.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding Checkpoint Redo Record
On Wed, Jan 4, 2012 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 04.01.2012 08:42, Amit Kapila wrote: Why PostgreSQL needs to write WAL record for Checkpoint when it maintains same information in pg_control file? I guess it wouldn't be strictly necessary... Apart from replicated standbys, which need that info for running restartpoints. Yeah. But, the OP makes me wonder: why can a standby only perform a restartpoint where the master performed a checkpoint? It seems like a standby ought to be able to create a restartpoint anywhere, just by writing everything, flushing it to disk, and update pg_control. I assume there's some reason that doesn't work, I just don't know what it is... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Wed, Jan 4, 2012 at 9:28 AM, Aidan Van Dyk ai...@highrise.ca wrote: I'ld love a hook script that was run if sync-rep state ever changed (heck, I'ld even like it if it just choose a new sync standby). That seems useful. I don't think the current code quite knows its own state; we seem to have each walsender recompute who the boss is, and if you query pg_stat_replication that redoes the same calculation. I can't shake the feeling that there's a better way... which would also facilitate this. Even better, is there a way we could start injecting notify events into the cluster on these types of changes? Especially now that notify events can take payloads, it means I don't have to keep constantly polling the database to see if it things its connected, etc. I like this idea, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_restore direct to database is broken for --insert dumps
In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php it's pointed out that recent versions of pg_restore fall over on archives made with -Fc --inserts (or --column-inserts), but only when restoring direct to database; if you ask for text output it's perfectly fine. Investigation shows that the problem is that individual INSERT commands are being broken apart at arbitrary buffer boundaries --- you don't see any problem in text output, but when the bufferloads are submitted as separate PQexec calls, of course bad things happen. I believe this worked okay up until my patch here: http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=6545a901a which removed the mini SQL lexer in pg_backup_db.c. I had supposed that that had no useful function except to separate COPY data from not-COPY-data, but in reality it had another function of ensuring that INSERT commands split across zlib bufferload boundaries would get reassembled before they are submitted to PQexec. Not entirely sure what to do about this. We could consider reverting the aforesaid patch and trying to find another way of fixing that code's failure to cope with standard-conforming strings, but I'm not sure that there's a good way to know whether standard_conforming_strings is active here, and anyway that code was ugly as sin and I'd rather not resurrect it. But on the other hand, there are no clear line boundaries in the compressed data, and we can't introduce any without (a) worsening compression and (b) breaking compatibility with existing dump files. Anybody have any bright ideas? I'm fresh out at the moment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 8:31 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: When we reach a restartpoint, we fsync everything down to disk and then nuke the double-write buffer. I think we add to the double-write buffer as we write pages from the buffer to disk. I don't think it makes sense to do potentially repeated writes of the same page with different contents to the double-write buffer as we go; nor is it a good idea to leave the page unsynced and let the double-write buffer grow for a long time. You may be right. Currently, though, we only fsync() at end-of-checkpoint. So we'd have to think about what to fsync, and how often, to keep the double-write buffer to a manageable size. I can't help thinking that any extra fsyncs are pretty expensive, though, especially if you have to fsync() every file that's been double-written before clearing the buffer. Possibly we could have 2^N separate buffers based on an N-bit hash of the relfilenode and segment number, so that we could just fsync 1/(2^N)-th of the open files at a time. But even that sounds expensive: writing back lots of dirty data isn't cheap. One of the systems I've been doing performance testing on can sometimes take 15 seconds to write a shutdown checkpoint, and I'm sure that other people have similar (and worse) problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] grants vs. inherited tables
On Fri, Dec 30, 2011 at 4:25 AM, Marko Kreen mark...@gmail.com wrote: I have the (hopefully wrong) impression that you're missing the fact that it already exists, at least in 9.0. You are right, I missed it. For quite obvious reason: $ grep -ri aclexplode doc/ $ Is there a good reason why it's undocumented? Internal/unstable API? I better avoid it then. But I would like to have this or similar function as part of public API. I don't see any real reason why we couldn't document this one. It returns OIDs, but that's the name of the game if you're doing anything non-trivial with PostgreSQL system catalogs. Off-hand I'm not quite sure which section of the documentation would be appropriate, though. It looks like the functions we provide are mostly documented in chapter 9, Functions and Operators. Section 9.23 on System Information Functions seems like it's probably the closest fit... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Robert Haas robertmh...@gmail.com wrote: we only fsync() at end-of-checkpoint. So we'd have to think about what to fsync, and how often, to keep the double-write buffer to a manageable size. I think this is the big tuning challenge with this technology. I can't help thinking that any extra fsyncs are pretty expensive, though, especially if you have to fsync() every file that's been double-written before clearing the buffer. Possibly we could have 2^N separate buffers based on an N-bit hash of the relfilenode and segment number, so that we could just fsync 1/(2^N)-th of the open files at a time. I'm not sure I'm following -- we would just be fsyncing those files we actually wrote pages into, right? Not all segments for the table involved? But even that sounds expensive: writing back lots of dirty data isn't cheap. One of the systems I've been doing performance testing on can sometimes take 15 seconds to write a shutdown checkpoint, Consider the relation-file fsyncs for double-write as a form of checkpoint spreading, and maybe it won't seem so bad. It should make that shutdown checkpoint less painful. Now, I have been thinking that on a write-heavy system you had better have a BBU write-back cache, but that's my recommendation, anyway. and I'm sure that other people have similar (and worse) problems. Well, I have no doubt that this feature should be optional. Those who prefer can continue to do full-page writes to the WAL, instead. Or take the running with scissors approach. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting -Werror in CFLAGS
On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Yes, I know that these only appeared in GCC 4.6+ and as such are a relatively recent phenomenon, but there has been some effort to eliminate them, and if I could get a non-hacked -Werror build I'd feel happy enough about excluding them as already outlined. I just do this: echo COPT=-Werror src/Makefile.custom ...which seems to work reasonably well. I see no point in -Werror whatsoever. If you aren't examining the make output for warnings, you're not following proper development practice IMO. I find -Werror to be a convenient way to examine the output for warnings. Otherwise they scroll off the screen. Yeah, I could save the output to a file and grep it afterwards, but that seems less convenient. I'm clearly not the only one doing it this way, since src/backend/parser/gram.o manually sticks in -Wno-error... gcc is not the only tool we use in the build process, so if you are relying on -Werror to call attention to everything you should be worrying about, you lost already. Hmm, I guess. I've never had a problem with anything else that I can remember, though. I'm also less than thrilled with the idea that whatever the gcc boys decide to make a warning tomorrow will automatically become a MUST FIX NOW for us. I'm not thrilled about that either. Especially since they seem to be adding more and more warnings that are harder and harder to work around for issues that are less and less important. Unimportant warnings that are easily avoidable are not so bad, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting -Werror in CFLAGS
On 4 January 2012 18:44, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm also less than thrilled with the idea that whatever the gcc boys decide to make a warning tomorrow will automatically become a MUST FIX NOW for us. I'm not thrilled about that either. Especially since they seem to be adding more and more warnings that are harder and harder to work around for issues that are less and less important. Unimportant warnings that are easily avoidable are not so bad, but... I'd have a certain amount of sympathy for that view. It took building with Clang to notice that we incorrectly used one enum rvalue to assign to a variable of another enum type, which I thought was a little bit surprising; I'd have expected GCC to catch that one, since it is pretty likely to be valid. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Dec 23, 2011, at 2:23 PM, Kevin Grittner wrote: Jeff Janes jeff.ja...@gmail.com wrote: Could we get some major OLTP users to post their CLOG for analysis? I wouldn't think there would be much security/propietary issues with CLOG data. FWIW, I got the raw numbers to do my quick check using this Ruby script (put together for me by Peter Brant). If it is of any use to anyone else, feel free to use it and/or post any enhanced versions of it. Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. I have a number of other systems I can profile if anyone's interested. 063A 379 063B 143 063C 94 063D 94 063E 326 063F 113 0640 122 0641 270 0642 81 0643 390 0644 183 0645 76 0646 61 0647 50 0648 275 0649 288 064A 126 064B 53 064C 59 064D 125 064E 357 064F 92 0650 54 0651 83 0652 267 0653 328 0654 118 0655 75 0656 104 0657 280 0658 414 0659 105 065A 74 065B 153 065C 303 065D 63 065E 216 065F 169 0660 113 0661 405 0662 85 0663 52 0664 44 0665 78 0666 412 0667 116 0668 48 0669 61 066A 66 066B 364 066C 104 066D 48 066E 68 066F 104 0670 465 0671 158 0672 64 0673 62 0674 115 0675 452 0676 296 0677 65 0678 80 0679 177 067A 316 067B 86 067C 87 067D 270 067E 84 067F 295 0680 299 0681 88 0682 35 0683 67 0684 66 0685 456 0686 146 0687 52 0688 33 0689 73 068A 147 068B 345 068C 107 068D 67 068E 50 068F 97 0690 473 0691 156 0692 47 0693 57 0694 97 0695 550 0696 224 0697 51 0698 80 0699 280 069A 115 069B 426 069C 241 069D 395 069E 98 069F 130 06A0 523 06A1 296 06A2 92 06A3 97 06A4 122 06A5 524 06A6 256 06A7 118 06A8 111 06A9 157 06AA 553 06AB 166 06AC 106 06AD 103 06AE 200 06AF 621 06B0 288 06B1 95 06B2 107 06B3 227 06B4 92 06B5 447 06B6 210 06B7 364 06B8 119 06B9 113 06BA 384 06BB 319 06BC 45 06BD 68 06BE 2 -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting -Werror in CFLAGS
On 04.01.2012 20:44, Robert Haas wrote: On Tue, Jan 3, 2012 at 9:23 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: On Tue, Jan 3, 2012 at 7:39 PM, Peter Geogheganpe...@2ndquadrant.com wrote: Yes, I know that these only appeared in GCC 4.6+ and as such are a relatively recent phenomenon, but there has been some effort to eliminate them, and if I could get a non-hacked -Werror build I'd feel happy enough about excluding them as already outlined. I just do this: echo COPT=-Werror src/Makefile.custom ...which seems to work reasonably well. I see no point in -Werror whatsoever. If you aren't examining the make output for warnings, you're not following proper development practice IMO. I find -Werror to be a convenient way to examine the output for warnings. Otherwise they scroll off the screen. Yeah, I could save the output to a file and grep it afterwards, but that seems less convenient. I'm clearly not the only one doing it this way, since src/backend/parser/gram.o manually sticks in -Wno-error... I use make -s. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting -Werror in CFLAGS
On 01/04/2012 02:35 PM, Heikki Linnakangas wrote: On 04.01.2012 20:44, Robert Haas wrote: On Tue, Jan 3, 2012 at 9:23 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: On Tue, Jan 3, 2012 at 7:39 PM, Peter Geogheganpe...@2ndquadrant.com wrote: Yes, I know that these only appeared in GCC 4.6+ and as such are a relatively recent phenomenon, but there has been some effort to eliminate them, and if I could get a non-hacked -Werror build I'd feel happy enough about excluding them as already outlined. I just do this: echo COPT=-Werror src/Makefile.custom ...which seems to work reasonably well. I see no point in -Werror whatsoever. If you aren't examining the make output for warnings, you're not following proper development practice IMO. I find -Werror to be a convenient way to examine the output for warnings. Otherwise they scroll off the screen. Yeah, I could save the output to a file and grep it afterwards, but that seems less convenient. I'm clearly not the only one doing it this way, since src/backend/parser/gram.o manually sticks in -Wno-error... I use make -s. Yeah, that's a good thing to do. We are by far the most vigilant project I am aware of about fixing warnings. That's a Good Thing (tm,). Build most FOSS software and you see huge numbers of warnings fly by. It can get quite distressing. We turn the errors off for gram.o precisely because we can't control it, since the included source file scan.c is generated by flex. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte blocks (1024 transactions each) were all commits. Yikes. That pretty much shoots down Robert's idea of summarized CLOG data, I think. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 1:32 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: we only fsync() at end-of-checkpoint. So we'd have to think about what to fsync, and how often, to keep the double-write buffer to a manageable size. I think this is the big tuning challenge with this technology. One of them, anyway. I think it may also be tricky to make sure that a backend that needs to write a dirty buffer doesn't end up having to wait for a double-write to be fsync'd. I can't help thinking that any extra fsyncs are pretty expensive, though, especially if you have to fsync() every file that's been double-written before clearing the buffer. Possibly we could have 2^N separate buffers based on an N-bit hash of the relfilenode and segment number, so that we could just fsync 1/(2^N)-th of the open files at a time. I'm not sure I'm following -- we would just be fsyncing those files we actually wrote pages into, right? Not all segments for the table involved? Yes. But even that sounds expensive: writing back lots of dirty data isn't cheap. One of the systems I've been doing performance testing on can sometimes take 15 seconds to write a shutdown checkpoint, Consider the relation-file fsyncs for double-write as a form of checkpoint spreading, and maybe it won't seem so bad. It should make that shutdown checkpoint less painful. Now, I have been thinking that on a write-heavy system you had better have a BBU write-back cache, but that's my recommendation, anyway. I think this point has possibly been beaten to death, but at the risk of belaboring the point I'll bring it up again: the frequency with which we fsync() is basically a trade-off between latency and throughput. If you fsync a lot, then each one will be small, so you shouldn't experience much latency, but throughput might suck. If you don't fsync very much, then you maximize the chances for write-combining (because inserting an fsync between two writes to the same block forces that block to be physically written twice rather than just once) thus improving throughput, but when you do get around to calling fsync() there may be a lot of data to write all at once, and you may get a gigantic latency spike. As far as I can tell, one fsync per checkpoint is the theoretical minimum, and that's what we do now. So our current system is optimized for throughput. The decision to put full-page images into WAL rather than a separate buffer is essentially turning the dial in the same direction, because, in effect, the double-write fsync piggybacks on the WAL fsync which we must do anyway. So both the decision to use a double-write buffer AT ALL and the decision to fsync more frequently to keep that buffer to a manageable size are going to result in turning that dial in the opposite direction. It seems to me inevitable that, even with the best possible implementation, throughput will get worse. With a good implementation but not a bad one, latency should improve. Now, this is not necessarily a reason to reject the idea. I believe that several people have proposed that our current implementation is *overly* optimized for throughput *at the expense of* latency, and that we might want to provide some options that, in one way or another, fsync more frequently, so that checkpoint spikes aren't as bad. But when it comes time to benchmark, we might need to think somewhat carefully about what we're testing... Another thought here is that double-writes may not be the best solution, and are almost certainly not the easiest-to-implement solution. We could instead do something like this: when an unlogged change is made to a buffer (e.g. a hint bit is set), we set a flag on the buffer header. When we evict such a buffer, we emit a WAL record that just overwrites the whole buffer with a new FPI. There are some pretty obvious usage patterns where this is likely to be painful (e.g. load a big table without setting hint bits, and then seq-scan it). But there are also many use cases where the working set fits inside shared buffers and data pages don't get written very often, apart from checkpoint time, and those cases might work just fine. Also, the cases that are problems for this implementation are likely to also be problems for a double-write based implementation, for exactly the same reasons: if you discover at buffer eviction time that you need to fsync something (whether it's WAL or DW), it's going to hurt. Checksums aren't free even when using double-writes: if you don't have checksums, pages that have only hint bit-changes don't need to be double-written. If double writes aren't going to give us anything for free, maybe that's not the right place to be focusing our efforts... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 12:56 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a scalar) but some other sort of animal: Yeah, it's a version object, but I'd have thought that SvPV and friends would automatically stringify such an object. Otherwise, practically any kind of perl extension could be crashed by passing it one, no? But Util.xs::util_elog() expects an SV and doesn't check whether or not it actually has one. I've found a few other ways of crashing this call (e.g. by passing a typeglob), so maybe we need to test that we actually have an SV. I think SvOK() is what we'd use for that - perl gurus please confirm. I looked at that last night but it appeared that SvOK would be perfectly happy. (Didn't actually try it, though, I was just eyeballing the flags in gdb.) I tested it and you're right, it doesn't help. I don't see what else we can do about it. There doesn't appear to be any test for an SV in the API. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE
Git master can already avoid rewriting the table for column type changes like varchar(10) - varchar(20). However, if the column in question is on either side of a FK relationship, we always revalidate the foreign key. Concretely, I wanted these no-rewrite type changes to also assume FK validity: - Any typmod-only change - text - varchar - domain - base type To achieve that, this patch skips the revalidation when two conditions hold: (a) Old and new pg_constraint.conpfeqop match exactly. This is actually stronger than needed; we could loosen things by way of operator families. However, no core type would benefit, and my head exploded when I tried to define the more-generous test correctly. (b) The functions, if any, implementing a cast from the foreign type to the primary opcintype are the same. For this purpose, we can consider a binary coercion equivalent to an exact type match. When the opcintype is polymorphic, require that the old and new foreign types match exactly. (Since ri_triggers.c does use the executor, the stronger check for polymorphic types is no mere future-proofing. However, no core type exercises its necessity.) These follow from the rules used to decide when to rebuild an index. I further justify them in source comments. To implement this, I have ATPostAlterTypeParse() stash the content of the old pg_constraint.conpfeqop in the Constraint node. ATAddForeignKeyConstraint() notices that and evaluates the above rules. If they both pass, it omits the validation step just as though skip_validation had been given. Thanks, nm diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3b52415..0cde503 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 276,281 static Oid transformFkeyCheckAttrs(Relation pkrel, --- 276,282 int numattrs, int16 *attnums, Oid *opclasses); static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts); + static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid *funcid); static void validateCheckConstraint(Relation rel, HeapTuple constrtup); static void validateForeignKeyConstraint(char *conname, Relation rel, Relation pkrel, *** *** 357,362 static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMOD --- 358,364 static void ATPostAlterTypeParse(Oid oldId, char *cmd, List **wqueue, LOCKMODE lockmode, bool rewrite); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); + static void TryReuseForeignKey(Oid oldId, Constraint *con); static void change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId); static void change_owner_recurse_to_sequences(Oid relationOid, *** *** 5696,5701 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5698,5705 numpks; Oid indexOid; Oid constrOid; + boolold_check_ok; + ListCell *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop); /* * Grab an exclusive lock on the pk table, so that someone doesn't delete *** *** 5812,5817 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5816,5828 (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg(number of referencing and referenced columns for foreign key disagree))); + /* +* On the strength of a previous constraint, we might avoid scanning +* tables to validate this one. See below. +*/ + old_check_ok = (fkconstraint-old_conpfeqop != NIL); + Assert(!old_check_ok || numfks == list_length(fkconstraint-old_conpfeqop)); + for (i = 0; i numpks; i++) { Oid pktype = pktypoid[i]; *** *** 5826,5831 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5837,5843 Oid ppeqop; Oid ffeqop; int16 eqstrategy; + Oid pfeqop_right; /* We need several fields out of the pg_opclass entry */ cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclasses[i])); *** *** 5868,5877 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, eqstrategy); if (OidIsValid(pfeqop)) ffeqop =
Re: [HACKERS] PL/Perl Does not Like vstrings
On Wed, Jan 4, 2012 at 13:13, Andrew Dunstan and...@dunslane.net wrote: On 01/04/2012 12:56 PM, Tom Lane wrote: I looked at that last night but it appeared that SvOK would be perfectly happy. (Didn't actually try it, though, I was just eyeballing the flags in gdb.) I tested it and you're right, it doesn't help. I don't see what else we can do about it. There doesn't appear to be any test for an SV in the API. I think about the best we can do is something along the lines of: sv2cstr() { ... if (Perl_vverify(sv)) return utf_u2e(SvPV(sv)); ... } I dont the the utf_u2e is strictly needed (other than that it strdups) as I don't think versions can have utf8 chars, or at least that $^V will not have utf8 chars (and even if it did it would only cause problems if they had codepoints in 128 255). We would still have issues with typeglobs... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Wed, Jan 4, 2012 at 3:02 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte blocks (1024 transactions each) were all commits. Yikes. That pretty much shoots down Robert's idea of summarized CLOG data, I think. I'm not *totally* certain of that... another way to look at it is that I have to be able to show a win even if only 18% of the probes into the summarized data are successful, which doesn't seem totally out of the question given how cheap I think lookups could be. But I'll admit it's not real encouraging. I think the first thing we need to look at is increasing the number of CLOG buffers. Even if hypothetical summarized CLOG data had a 60% hit rate rather than 18%, 8 CLOG buffers is probably still not going to be enough for a 32-core system, let alone anything larger. I am aware of two concerns here: 1. Unconditionally adding more CLOG buffers will increase PostgreSQL's minimum memory footprint, which is bad for people suffering under default shared memory limits or running a database on a device with less memory than a low-end cell phone. 2. The CLOG code isn't designed to manage a large number of buffers, so adding more might cause a performance regression on small systems. On Nate Boley's 32-core system, running pgbench at scale factor 100, the optimal number of buffers seems to be around 32. I'd like to get some test results from smaller systems - any chance you (or anyone) have, say, an 8-core box you could test on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] controlling the location of server-side SSL files
On Tue, Jan 3, 2012 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 3, 2012 at 6:25 PM, Peter Eisentraut pete...@gmx.net wrote: [ reasons ] I agree with these reasons. We don't get charged $0.50 per GUC, so there's no particular reason to contort things to have fewer of them. Well, there definitely is a distributed cost to each additional GUC. Peter's given what are probably adequate reasons to add several of them here, but that doesn't mean we should not ask the question whether each new GUC is really necessary. No argument. I'm merely saying that I think the rationale for these GUCs is solid enough to justify their existence. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash
Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher: [12:03pm] RhodiumToad: what happens is this [12:04pm] RhodiumToad: postquel_start know this statement doesn't return the result, so it supplies None_Receiver as the dest-receiver for the query [12:04pm] RhodiumToad: however, it knows it's a plannedStmt, so it fires up the full executor to run it [12:05pm] RhodiumToad: and the executor allocates a new destreceiver in its own memory context, replaces es-qd-dest with it, [12:05pm] RhodiumToad: (the new destreceiver is the one that writes tuples to the created table) [12:06pm] RhodiumToad: then at executorEnd (called from postquel_end), executor shutdown closes the new rel, _and then frees the executor's memory context, including the destreceiver it created [12:07pm] RhodiumToad: postquel_end doesn't know that its setting of -dest was clobbered, so it goes to try and destroy it again, and gets garbage (if assertions are on) [12:07pm] RhodiumToad: if assertions weren't on, then the rDestroy call is harmless [12:07pm] RhodiumToad: well, mostly harmless [12:07pm] RhodiumToad: sneaky one, that [12:09pm] RhodiumToad: you can confirm it by tracing through that second call to postquel_end and confirming that it's the call to ExecutorEnd that stomps the content of qd-dest [12:12pm] pramsey: confirmed, the pass through ExecutorEnd has clobbered the value so there's garbage when it arrives at line 638 [12:14pm] RhodiumToad: if you trace through ExecutorEnd itself, it should be the FreeExecutorState that does it [12:15pm] RhodiumToad: wonder how far back this bug goes [12:16pm] RhodiumToad: actually not very far [12:17pm] RhodiumToad: older versions just figured that qd-dest was always None_Receiver and therefore did not need an rDestroy call [12:17pm] RhodiumToad: (which is a no-op for None_Receiver) [12:17pm] pramsey: kills my 8.4 [12:17pm] RhodiumToad: so this is broken in 8.4+ [12:17pm] pramsey: ah [12:18pm] RhodiumToad: 8.4 introduced the lazy-eval of selects in sql functions [12:19pm] RhodiumToad: prior to that they were always run immediately to completion [12:19pm] RhodiumToad: that requires juggling the destreceiver a bit, hence the bug [12:20pm] RhodiumToad: btw, the first statement of the function shouldn't be needed [12:21pm] RhodiumToad: just ... as $f$ create table foo as select 1 as x; $f$; should be enough to break it [12:31pm] RhodiumToad: there's no trivial fix On Wed, Jan 4, 2012 at 11:32 AM, Paul Ramsey pram...@cleverelephant.ca wrote: One extra detail, my PostgreSQL is compiled with --enable-cassert. This is required to set off the killer function. On Wed, Jan 04, 2012 at 07:17:17PM +, pram...@cleverelephant.ca wrote: The following bug has been logged on the website: Bug reference: 6379 Logged by: Paul Ramsey Email address: pram...@cleverelephant.ca PostgreSQL version: 9.1.2 Operating system: OSX 10.6.8 Description: CREATE OR REPLACE FUNCTION kill_backend() RETURNS VOID AS $$ DROP TABLE if EXISTS foo; CREATE TABLE foo AS SELECT * FROM pg_class LIMIT 1; $$ LANGUAGE 'sql'; SELECT kill_backend(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE
I neglected to commit after revising the text of a few comments; use this version instead. Thanks. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3b52415..9eba8e8 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 276,281 static Oid transformFkeyCheckAttrs(Relation pkrel, --- 276,282 int numattrs, int16 *attnums, Oid *opclasses); static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts); + static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid *funcid); static void validateCheckConstraint(Relation rel, HeapTuple constrtup); static void validateForeignKeyConstraint(char *conname, Relation rel, Relation pkrel, *** *** 357,362 static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMOD --- 358,364 static void ATPostAlterTypeParse(Oid oldId, char *cmd, List **wqueue, LOCKMODE lockmode, bool rewrite); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); + static void TryReuseForeignKey(Oid oldId, Constraint *con); static void change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId); static void change_owner_recurse_to_sequences(Oid relationOid, *** *** 5696,5701 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5698,5705 numpks; Oid indexOid; Oid constrOid; + boolold_check_ok; + ListCell *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop); /* * Grab an exclusive lock on the pk table, so that someone doesn't delete *** *** 5812,5817 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5816,5828 (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg(number of referencing and referenced columns for foreign key disagree))); + /* +* On the strength of a previous constraint, we might avoid scanning +* tables to validate this one. See below. +*/ + old_check_ok = (fkconstraint-old_conpfeqop != NIL); + Assert(!old_check_ok || numfks == list_length(fkconstraint-old_conpfeqop)); + for (i = 0; i numpks; i++) { Oid pktype = pktypoid[i]; *** *** 5826,5831 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5837,5843 Oid ppeqop; Oid ffeqop; int16 eqstrategy; + Oid pfeqop_right; /* We need several fields out of the pg_opclass entry */ cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclasses[i])); *** *** 5868,5877 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, eqstrategy); if (OidIsValid(pfeqop)) ffeqop = get_opfamily_member(opfamily, fktyped, fktyped, eqstrategy); else ! ffeqop = InvalidOid;/* keep compiler quiet */ if (!(OidIsValid(pfeqop) OidIsValid(ffeqop))) { --- 5880,5896 pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, eqstrategy); if (OidIsValid(pfeqop)) + { + pfeqop_right = fktyped; ffeqop = get_opfamily_member(opfamily, fktyped, fktyped, eqstrategy); + } else ! { ! /* keep compiler quiet */ ! pfeqop_right = InvalidOid; ! ffeqop = InvalidOid; ! } if (!(OidIsValid(pfeqop) OidIsValid(ffeqop))) { *** *** 5893,5899 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5912,5921 target_typeids[1] = opcintype; if (can_coerce_type(2, input_typeids, target_typeids, COERCION_IMPLICIT)) + {
Re: [HACKERS] 16-bit page checksums for 9.2
Robert Haas robertmh...@gmail.com wrote: I think it may also be tricky to make sure that a backend that needs to write a dirty buffer doesn't end up having to wait for a double-write to be fsync'd. This and other parts of your post seem to ignore the BBU write-back cache. Multiple fsyncs of a single page can still be collapsed at that level to a single actual disk write. In fact, I rather doubt this technology will look very good on machines without write-back caching. I'm not as sure as you are that this is a net loss in throughput, either. When the fsync storm clogs the RAID controller, even reads stall, so something which more evenly pushes writes to disk might avoid these non-productive pauses. I think that could improve throughput enough to balance or exceed the other effects. Maybe. I agree we need to be careful to craft a good set of benchmarks here. Checksums aren't free even when using double-writes: if you don't have checksums, pages that have only hint bit-changes don't need to be double-written. Agreed. Checksums aren't expected to be free under any circumstances. I'm expecting DW to be slightly faster than FPW in general, with or without in-page checksums. If double writes aren't going to give us anything for free, maybe that's not the right place to be focusing our efforts... I'm not sure why it's not enough that they improve performance over the alternative. Making some other feature with obvious overhead free seems an odd requirement to hang on this. (Maybe I'm misunderstanding you on that point?) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 12:56 PM, Tom Lane wrote: I looked at that last night but it appeared that SvOK would be perfectly happy. (Didn't actually try it, though, I was just eyeballing the flags in gdb.) I tested it and you're right, it doesn't help. I don't see what else we can do about it. There doesn't appear to be any test for an SV in the API. I think what's being passed *is* an SV --- at least, the contents look reasonable in gdb --- but for some reason SvPVutf8 isn't coping with this particular kind of SV. Googling suggests that SvPVutf8 used to fail on READONLY SVs, of which this is one if I'm reading the flag bits correctly; but that was supposedly fixed years ago. I believe we've hit some other undocumented limitation of that function, which the Perl guys may or may not acknowledge as a bug once we've tracked it down better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Robert Haas robertmh...@gmail.com wrote: 2. The CLOG code isn't designed to manage a large number of buffers, so adding more might cause a performance regression on small systems. On Nate Boley's 32-core system, running pgbench at scale factor 100, the optimal number of buffers seems to be around 32. I'd like to get some test results from smaller systems - any chance you (or anyone) have, say, an 8-core box you could test on? Hmm. I can think of a lot of 4-core servers I could test on. (We have a few poised to go into production where it would be relatively easy to do benchmarking without distorting factors right now.) After that we jump to 16 cores, unless I'm forgetting something. These are currently all in production, but some of them are redundant machines which could be pulled for a few hours here and there for benchmarks. If either of those seem worthwhile, please spec the useful tests so I can capture the right information. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding Checkpoint Redo Record
Robert Haas robertmh...@gmail.com writes: But, the OP makes me wonder: why can a standby only perform a restartpoint where the master performed a checkpoint? It seems like a standby ought to be able to create a restartpoint anywhere, just by writing everything, flushing it to disk, and update pg_control. Perhaps, but then crash restarts would have to accept start pointers that point at any random place in the WAL. I like the additional error checking of verifying that there's a checkpoint recod there. Also I think the full-page-write mechanism would no longer protect against torn pages during replay if you did that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash
Paul Ramsey pram...@cleverelephant.ca writes: Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher: [12:31pm] RhodiumToad: there's no trivial fix IMO the main bug here is that functions.c isn't expecting qd-dest to be overwritten, so we could work around it by keeping a separate private copy of the dest pointer. However, it would also be fair to ask whether there's not a cleaner solution. Perhaps the intoRel stuff should be saving/restoring the original destreceiver instead of just blindly overwriting it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 3:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: If double writes aren't going to give us anything for free, maybe that's not the right place to be focusing our efforts... I'm not sure why it's not enough that they improve performance over the alternative. Making some other feature with obvious overhead free seems an odd requirement to hang on this. (Maybe I'm misunderstanding you on that point?) Well, this thread is nominally about checksums, but here we are talking about double writes, so I thought we were connecting those features in some way? Certainly, life is easier if we can develop them completely separately - but checksums really ought to come with some sort of solution to the problem of a torn-page with hint bit changes, IMO, and I thought that's why were thinking so hard about DW just now. Maybe I'm confused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding Checkpoint Redo Record
On Wed, Jan 4, 2012 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: But, the OP makes me wonder: why can a standby only perform a restartpoint where the master performed a checkpoint? It seems like a standby ought to be able to create a restartpoint anywhere, just by writing everything, flushing it to disk, and update pg_control. Perhaps, but then crash restarts would have to accept start pointers that point at any random place in the WAL. I like the additional error checking of verifying that there's a checkpoint recod there. I could go either way on that one, but... Also I think the full-page-write mechanism would no longer protect against torn pages during replay if you did that. ...this is a very good point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Jan 4, 2012, at 2:02 PM, Kevin Grittner wrote: Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte blocks (1024 transactions each) were all commits. Yikes. That pretty much shoots down Robert's idea of summarized CLOG data, I think. Here's another data point. This is for a londiste slave of what I posted earlier. Note that this slave has no users on it. 054A 654 054B 835 054C 973 054D 1020 054E 1012 054F 1022 0550 284 And these clog files are from Sep 15-30... I believe that's the period when we were building this slave, but I'm not 100% certain. 04F0 194 04F1 253 04F2 585 04F3 243 04F4 176 04F5 164 04F6 358 04F7 505 04F8 168 04F9 180 04FA 369 04FB 318 04FC 236 04FD 437 04FE 242 04FF 625 0500 222 0501 139 0502 174 0503 91 0504 546 0505 220 0506 187 0507 151 0508 199 0509 491 050A 232 050B 170 050C 191 050D 414 050E 557 050F 231 0510 173 0511 159 0512 436 0513 789 0514 354 0515 157 0516 187 0517 333 0518 599 0519 483 051A 300 051B 512 051C 713 051D 422 051E 291 051F 596 0520 785 0521 825 0522 484 0523 238 0524 151 0525 190 0526 256 0527 403 0528 551 0529 757 052A 837 052B 418 052C 256 052D 161 052E 254 052F 423 0530 469 0531 757 0532 627 0533 325 0534 224 0535 295 0536 290 0537 352 0538 561 0539 565 053A 833 053B 756 053C 485 053D 276 053E 241 053F 270 0540 334 0541 306 0542 700 0543 821 0544 402 0545 199 0546 226 0547 250 0548 354 0549 587 This is for a slave of that database that does have user activity: 054A 654 054B 835 054C 420 054D 432 054E 852 054F 666 0550 302 0551 243 0552 600 0553 295 0554 617 0555 504 0556 232 0557 304 0558 580 0559 156 -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Wed, Jan 4, 2012 at 4:02 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: 2. The CLOG code isn't designed to manage a large number of buffers, so adding more might cause a performance regression on small systems. On Nate Boley's 32-core system, running pgbench at scale factor 100, the optimal number of buffers seems to be around 32. I'd like to get some test results from smaller systems - any chance you (or anyone) have, say, an 8-core box you could test on? Hmm. I can think of a lot of 4-core servers I could test on. (We have a few poised to go into production where it would be relatively easy to do benchmarking without distorting factors right now.) After that we jump to 16 cores, unless I'm forgetting something. These are currently all in production, but some of them are redundant machines which could be pulled for a few hours here and there for benchmarks. If either of those seem worthwhile, please spec the useful tests so I can capture the right information. Yes, both of those seem useful. To compile, I do this: ./configure --prefix=$HOME/install/$BRANCHNAME --enable-depend --enable-debug ${EXTRA_OPTIONS} make make -C contrib/pgbench make check make install make -C contrib/pgbench install In this case, the relevant builds would probably be (1) master, (2) master with NUM_CLOG_BUFFERS = 16, (3) master with NUM_CLOG_BUFFERS = 32, and (4) master with NUM_CLOG_BUFFERS = 48. (You could also try intermediate numbers if it seems warranted.) Basic test setup: rm -rf $PGDATA ~/install/master/bin/initdb cat $PGDATA/postgresql.conf EOM; shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms EOM I'm attaching a driver script you can modify to taste. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company runtestw Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 03:56 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 01/04/2012 12:56 PM, Tom Lane wrote: I looked at that last night but it appeared that SvOK would be perfectly happy. (Didn't actually try it, though, I was just eyeballing the flags in gdb.) I tested it and you're right, it doesn't help. I don't see what else we can do about it. There doesn't appear to be any test for an SV in the API. I think what's being passed *is* an SV --- at least, the contents look reasonable in gdb --- but for some reason SvPVutf8 isn't coping with this particular kind of SV. Googling suggests that SvPVutf8 used to fail on READONLY SVs, of which this is one if I'm reading the flag bits correctly; but that was supposedly fixed years ago. I believe we've hit some other undocumented limitation of that function, which the Perl guys may or may not acknowledge as a bug once we've tracked it down better. Well, the crash is apparently solved by the following, which your investigation suggested to me: diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs index 7d0102b..0785e2e 100644 --- a/src/pl/plperl/Util.xs +++ b/src/pl/plperl/Util.xs @@ -41,7 +41,7 @@ do_util_elog(int level, SV *msg) PG_TRY(); { - cmsg = sv2cstr(msg); + cmsg = sv2cstr(newSVsv(msg)); elog(level, %s, cmsg); pfree(cmsg); } cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 03:56 PM, Tom Lane wrote: I think what's being passed *is* an SV --- at least, the contents look reasonable in gdb --- but for some reason SvPVutf8 isn't coping with this particular kind of SV. Googling suggests that SvPVutf8 used to fail on READONLY SVs, of which this is one if I'm reading the flag bits correctly; but that was supposedly fixed years ago. I believe we've hit some other undocumented limitation of that function, which the Perl guys may or may not acknowledge as a bug once we've tracked it down better. Well, the crash is apparently solved by the following, which your investigation suggested to me: - cmsg = sv2cstr(msg); + cmsg = sv2cstr(newSVsv(msg)); That's kinda grotty ... and leaky ... I installed perl-debuginfo and soon found that SvPVutf8 leads to here: (gdb) s 9066Perl_croak(aTHX_ Can't coerce readonly %s to string in %s, (gdb) bt #0 Perl_sv_pvn_force_flags (my_perl=0x17f3170, sv=0x18b6c50, lp=0x7fffb0c8e2f8, flags=optimized out) at sv.c:9066 #1 0x0038c30c7003 in Perl_sv_utf8_upgrade_flags_grow (my_perl=0x17f3170, sv=0x18b6c50, flags=2, extra=0) at sv.c:3228 #2 0x0038c30c7778 in Perl_sv_2pvutf8 (my_perl=0x17f3170, sv=0x18b6c50, lp=0x7fffb0c8e370) at sv.c:3079 #3 0x7f4308447614 in sv2cstr (sv=0x18b6c50) at plperl_helpers.h:54 #4 0x7f430844771f in do_util_elog (level=18, msg=0x18b6c50) at Util.xs:44 #5 0x7f4308447bdc in XS__elog (my_perl=0x17f3170, cv=0x181e008) at Util.xs:105 #6 0x0038c30b548f in Perl_pp_entersub (my_perl=0x17f3170) at pp_hot.c:3046 #7 0x0038c30ac796 in Perl_runops_standard (my_perl=0x17f3170) at run.c:41 #8 0x0038c30480ae in Perl_call_sv (my_perl=0x17f3170, sv=0x19843e0, flags=10) at perl.c:2647 #9 0x7f4308440f3e in plperl_call_perl_func (desc=0x7fffb0c8e8b0, fcinfo=0x7fffb0c8fe10) at plperl.c:2018 #10 0x7f430843fa99 in plperl_inline_handler (fcinfo=0x7fffb0c902a0) at plperl.c:1751 which leads to a few conclusions: 1. SvPVutf8 fails on readonly SVs, despite the fact that no such limitation is documented and that this was supposedly fixed in 2004, cf http://www.nntp.perl.org/group/perl.perl5.porters/2004/03/msg89505.html We ought to hold somebody's feet to the fire about that. I don't really expect any response beyond documenting the limitation in perlapi, but at least they ought to do that. 2. A slightly cleaner fix for this should be to duplicate the SV and then release the copy around the SvPVutf8 call, only if it's readonly. Fixing it in do_util_elog is entirely the wrong thing. 3. Perl_croak inside a PG_TRY block fails quite nastily. I think we might be well advised to move the sv2cstr(msg) call outside the PG_TRY, but I'm wondering whether there is not a more general structural problem in plperl concerning nesting of PG and Perl error recovery. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
On 01/04/2012 01:13 PM, Tom Lane wrote: In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php it's pointed out that recent versions of pg_restore fall over on archives made with -Fc --inserts (or --column-inserts), but only when restoring direct to database; if you ask for text output it's perfectly fine. Investigation shows that the problem is that individual INSERT commands are being broken apart at arbitrary buffer boundaries --- you don't see any problem in text output, but when the bufferloads are submitted as separate PQexec calls, of course bad things happen. I believe this worked okay up until my patch here: http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=6545a901a which removed the mini SQL lexer in pg_backup_db.c. I had supposed that that had no useful function except to separate COPY data from not-COPY-data, but in reality it had another function of ensuring that INSERT commands split across zlib bufferload boundaries would get reassembled before they are submitted to PQexec. Not entirely sure what to do about this. We could consider reverting the aforesaid patch and trying to find another way of fixing that code's failure to cope with standard-conforming strings, but I'm not sure that there's a good way to know whether standard_conforming_strings is active here, and anyway that code was ugly as sin and I'd rather not resurrect it. But on the other hand, there are no clear line boundaries in the compressed data, and we can't introduce any without (a) worsening compression and (b) breaking compatibility with existing dump files. Anybody have any bright ideas? I'm fresh out at the moment. I pondered this while out on my daily constitutional. The first thing that occurred to me is that it would possibly have been better if we'd made pg_dump not use a quoting mechanism whose behaviour is dependent on a setting (e.g. E'' or dollar quoting). But I guess that's water under the bridge. Could we detect an appropriate line ending in ahwrite() after it's been decompressed and buffer partial lines accordingly? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 05:05 PM, Tom Lane wrote: Well, the crash is apparently solved by the following, which your investigation suggested to me: - cmsg = sv2cstr(msg); + cmsg = sv2cstr(newSVsv(msg)); That's kinda grotty ... and leaky ... Of course it is. It wasn't meant as a solution but as validation of your suspicions about the nature of the problem. (The leakiness could be solved, though.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Dec 20, 2011, at 11:29 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: So, what do we do about this? The obvious answer is increase NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea. As you say, that's likely to hurt people running in small shared memory. I too have thought about merging the SLRU areas into the main shared buffer arena, and likewise have concluded that it is likely to be way more painful than it's worth. What I think might be an appropriate compromise is something similar to what we did for autotuning wal_buffers: use a fixed percentage of shared_buffers, with some minimum and maximum limits to ensure sanity. But picking the appropriate percentage would take a bit of research. ISTM that this is based more on number of CPUs rather than total memory, no? Likewise, things like the number of shared buffer partitions would be highly dependent on the number of CPUs. So perhaps we should either probe the number of CPUs on a box, or have a GUC to tell us how many there are... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Dec 19, 2011, at 12:31 PM, Simon Riggs wrote: On Sun, Dec 18, 2011 at 4:22 PM, Jim Nasby j...@nasby.net wrote: On Dec 18, 2011, at 2:28 AM, Gianni Ciolli wrote: I have written some notes about autonomous subtransactions, which have already been touched (at least) in two separate threads; please find them at http://wiki.postgresql.org/wiki/Autonomous_subtransactions The document seems to mix the terms subtransaction and autonomous transaction. That's going to generate a ton of confusion, because both terms already have meaning associated with them: - Autonomous transaction means you can execute something outside of your current transaction and it is in no way effected by the current transaction (doesn't matter if T0 commits or not). - Subtransactions are an alternative to savepoints. They allow you to break a large transaction into smaller chunks, but if T0 doesn't commit then none of the subtransactions do either. OK, perhaps we should just stick to the term Autonomous Transaction. That term is in common use, even if the usage is otherwise exactly the same as a subtransaction i.e. main transaction stops until the subtransaction is complete. Except AFAIR Oracle uses the term to indicate something that is happening *outside* of your current transaction, which is definitely not what the proposal is talking about. I'm not wed to subtransaction (though I think it's a perfectly good name for this), but I definitely think calling this an autonomous transaction would be bad. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 4, 2012, at 2:48 PM, Andrew Dunstan wrote: That's kinda grotty ... and leaky ... Of course it is. It wasn't meant as a solution but as validation of your suspicions about the nature of the problem. (The leakiness could be solved, though.) From #p5p on irc.perl.org: [10:58pm]dg:interesting, so SvPV() handles string overloading, but SvPVutf8() doesn't, yet: Like CSvPV, but converts sv to utf8 first if necessary. [10:58pm]dg:oh, only for readonly objects [10:58pm]dg:probably why no-one has noticed, as version is probably the only readonly thing with string overloading [11:08pm]TonyC:it doesn't need string overloading [11:09pm]TonyC:https://gist.github.com/1562734 [11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() should prevent the leak in that workaround, assuming there's no FREETMPS between the call and use of the return value Useful? David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 01:13 PM, Tom Lane wrote: Not entirely sure what to do about this. We could consider reverting the aforesaid patch and trying to find another way of fixing that code's failure to cope with standard-conforming strings, but I'm not sure that there's a good way to know whether standard_conforming_strings is active here, and anyway that code was ugly as sin and I'd rather not resurrect it. But on the other hand, there are no clear line boundaries in the compressed data, and we can't introduce any without (a) worsening compression and (b) breaking compatibility with existing dump files. Anybody have any bright ideas? I'm fresh out at the moment. I pondered this while out on my daily constitutional. The first thing that occurred to me is that it would possibly have been better if we'd made pg_dump not use a quoting mechanism whose behaviour is dependent on a setting (e.g. E'' or dollar quoting). But I guess that's water under the bridge. Could we detect an appropriate line ending in ahwrite() after it's been decompressed and buffer partial lines accordingly? Not easily: there could be newlines embedded in data strings or SQL identifiers. I'm not seeing any way around this except to restore the minimal lexing capability. One thing we could probably do is to restrict it to be used only when reading table data, and continue to assume that object creation commands can be emitted as-is. That would at least get us out of needing to parse dollar-quoted literals, which aren't used in the INSERT commands. But we'd have to deal with standard-conforming strings some way. The idea I had about that, since we have an open database connection at hand, is to check the connected backend's standard_conforming_strings state via PQparameterStatus. If it doesn't have the right setting then things are going to fail anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash
I wrote: Perhaps the intoRel stuff should be saving/restoring the original destreceiver instead of just blindly overwriting it. I concluded that was the best fix, and have committed it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add SPI results constants available for PL/*
I agree with Pavel also. Putting these constants in the pg_catalog isn't the cleanest solution. Though one can make its own little lib in python, perl, whatever, to store these constants, it would be better if through the compilation, these C constants were copied in a way for PL/*. I can't really tell for the procedure languages other than PL/python but for this one, typically, it would be nice to have them in *plpy* module. result = plpy.execute(sql_query) if result.status() == plpy.SPI_OK_SELECT: # something... Since the PG developers are the one who decide these constant values, it's quite logic that the equivalent constants for each PL are made available by the PG developers and not each geek to redo the wheel on it's own. Well, it's not a crucial feature of course, but a nice to have, that was my point ;-) Have a nice day, guys, Samuel On Wed, Jan 4, 2012 at 3:11 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/03/2012 08:40 PM, Robert Haas wrote: On Tue, Jan 3, 2012 at 9:33 AM, Pavel Stehulepavel.stehule@gmail.**compavel.steh...@gmail.com wrote: I'd suppose it interesting to add a table to pg_catalog containing this data. - it is useless overhead I tend to agree. I am expecting so definition some constants in Perl, Python is simple Presumably one could instead write a script to transform the list of constants into a .pm file that could be loaded into the background, or whatever PL/python's equivalent of that concept is. Not sure if there's a better way to do it. Yeah, I'm with you and Pavel. Here's my quick perl one-liner to produce a set of SPI_* constants for pl/perl. I'm looking at the best way to include this in the bootstrap code. perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; } print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;' src/include/executor/spi.h cheers andrew
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
On 01/04/2012 06:20 PM, Tom Lane wrote: But we'd have to deal with standard-conforming strings some way. The idea I had about that, since we have an open database connection at hand, is to check the connected backend's standard_conforming_strings state via PQparameterStatus. If it doesn't have the right setting then things are going to fail anyway. Do we care what it is? Or can we just issue a SET to make it what it needs to be? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Wed, Jan 04, 2012 at 04:58:08PM -0600, Jim Nasby wrote: Except AFAIR Oracle uses the term to indicate something that is happening *outside* of your current transaction, which is definitely not what the proposal is talking about. That feature is commonly translated in PostgreSQL to a dblink-based solution, which itself is not distant from the current proposal, at least in terms of inside/outside (the biggest difference I can see is on sharing temporary tables). But I am not sure I understand your remark; it would be clearer to me if you could provide an example explaining the difference. Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 06:20 PM, Tom Lane wrote: But we'd have to deal with standard-conforming strings some way. The idea I had about that, since we have an open database connection at hand, is to check the connected backend's standard_conforming_strings state via PQparameterStatus. If it doesn't have the right setting then things are going to fail anyway. Do we care what it is? Well, yeah, we care. Or can we just issue a SET to make it what it needs to be? We already did, but I don't think the code in pg_backup_db has convenient access to the value. I might be wrong about that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 06:15 PM, David E. Wheeler wrote: [11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() should prevent the leak in that workaround, assuming there's no FREETMPS between the call and use of the return value That's the solution to leakiness I had in mind. Tom said: 2. A slightly cleaner fix for this should be to duplicate the SV and then release the copy around the SvPVutf8 call, only if it's readonly. Fixing it in do_util_elog is entirely the wrong thing. How do we tell if it's readonly? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: Tom said: 2. A slightly cleaner fix for this should be to duplicate the SV and then release the copy around the SvPVutf8 call, only if it's readonly. Fixing it in do_util_elog is entirely the wrong thing. How do we tell if it's readonly? SvREADONLY(sv) macro. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
Thanks, i missed that. After doing these changes, following is the observation: 1. The size of the pgstat file is 86KB. Last edited was when i moved the file location to RAMdisk. 2. The issue persists. I am seeing continuous logs: 2012-01-05 00:00:06 JST WARNING: pgstat wait timeout 2012-01-05 00:00:14 JST WARNING: pgstat wait timeout 2012-01-05 00:00:26 JST WARNING: pgstat wait timeout . . . 2012-01-05 15:36:25 JST WARNING: pgstat wait timeout 2012-01-05 15:36:37 JST WARNING: pgstat wait timeout 2012-01-05 15:36:45 JST WARNING: pgstat wait timeout -- View this message in context: http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5121894.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers