Re: [HACKERS] BGWriter latch, power saving

2012-01-04 Thread Simon Riggs
On Wed, Jan 4, 2012 at 7:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Setting a latch that's already set is fast,
 but surely it's even faster to not even try.

Agreed. I think we should SetLatch() at the first point a backend
writes a dirty buffer because the bgwriter has been inactive.

Continually waking the bgwriter makes it harder for it to return to sleep.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs
On Tue, Jan 3, 2012 at 11:00 PM, Jim Nasby j...@nasby.net wrote:
 On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote:
 (2)  I'm not sure about doing this in three parts, to skip the
 checksum itself and the hole in the middle of the page.  Is this
 because the hole might not have predictable data?  Why would that
 matter, as long as it is read back the same?

 IMO not checksumming the free space would be a really bad idea. It's entirely 
 possible to have your hardware crapping on your free space, and I'd still 
 want to know that that was happening. Now, it would be interesting if the 
 free space could be checksummed separately, since there's no reason to refuse 
 to read the page if only the free space is screwed up... But given the 
 choice, I'd rather get an error when the free space is corrupted and be 
 forced to look into things rather than blissfully ignore corrupted free space 
 only to be hit later with real data loss.

I see that argument. We don't have space for 2 checksums.

We can either

(1) report all errors on a page, including errors that don't change
PostgreSQL data. This involves checksumming long strings of zeroes,
which the checksum algorithm can't tell apart from long strings of
ones.

(2) report only errors that changed PostgreSQL data.

We already do (1) for WAL CRCs so doing the same thing for page
checksums makes sense and is much faster.

If enough people think we should do (2) then its a simple change to the patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs
On Tue, Jan 3, 2012 at 10:21 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 I'm happy with how this looks, except (as noted in a code comment)
 that there seems to be room for optimization of the calculation
 itself.  Details below:

...

 (3)  Rather than having PageSetVerificationInfo() use memcpy,
 followed by pass through the copied data to calculate the checksum,
 might it be better to have a copy and calculate version of the
 function (as in VMware's original patch) to save an extra pass over
 the page image?

 Other than these performance tweaks around the calculation phase, I
 didn't spot any problems.  I beat up on it a bit on a couple
 machines without hitting any bugs or seeing any regression test
 failures.

My focus was on getting something working first, then tuning. If we're
agreed that we have everything apart from the tuning then we can
proceed with tests to see which works better.

The copy and calculate approach might get in the way of hardware
prefetch since in my understanding the memory fetch time exceeds the
calculation time. As discussed elsewhere using that code or not would
not stop that work being credited.

David, please can you rework the VMware calc patch to produce an
additional 16-bit checksum mechanism in a way compatible with the
16bit patch, so we can test the two versions of the calculation? We
can make the GUC an enum so that the page checksum is selectable (for
testing).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 12:47 AM, David E. Wheeler wrote:

Is this perhaps by design?

Oy, this doesn’t look good:

$ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
ERROR:  server conn crashed?
ERROR:  server conn crashed?
The connection to the server was lost. Attempting reset: Succeeded.
(pgxn@localhost:5900) 06:44:42 [pgxn]
$



Try

   elog(NOTICE, $^V)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Andres Freund
On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
 (1)  I like the choice of Fletcher-16.  It should be very good at
 detecting problems while being a lot less expensive that an official
 CRC calculation. 
I wonder if CRC32c wouldn't be a good alternative given more and more cpus 
(its in SSE 4.2) support calculating it in silicon.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs
On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund and...@anarazel.de wrote:
 On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
 (1)  I like the choice of Fletcher-16.  It should be very good at
 detecting problems while being a lot less expensive that an official
 CRC calculation.
 I wonder if CRC32c wouldn't be a good alternative given more and more cpus
 (its in SSE 4.2) support calculating it in silicon.

We're trying to get something that fits in 16bits for this release.
I'm guessing CRC32c doesn't?


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-04 Thread Simon Riggs
On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.

 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.

 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.

 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

 Simon is after a different solution involving getting
 rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf-freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf-io_in_progress_lock = LWLockAssign();
 			buf-content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl-numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the clock sweep algorithm */
+	/* Run the clock sweep algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; decrement the usage_count (unless pinned) and keep scanning.
 		 */
-		LockBufHdr(buf);
 		if (buf-refcount == 0)
 		{
-			if (buf-usage_count  0)
+			if (buf-usage_count  StrategyControl-completePasses)
 			{
 buf-usage_count--;
 trycounter = NBuffers;
 			}
 			else
 			{
-/* Found a usable buffer */
-if (strategy != NULL)
-	AddBufferToRing(strategy, buf);
-return buf;
+LockBufHdr(buf);
+if (buf-refcount == 0)
+{
+	UnlockBufHdr(buf);
+	/* Found a usable buffer */
+	if (strategy != NULL)
+		AddBufferToRing(strategy, buf);
+	return buf;
+}
+UnlockBufHdr(buf);
 			}
 		}
 		else if (--trycounter == 0)
@@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 			 * probably better to fail than to risk getting stuck in an
 			 * infinite loop.
 			 */
-			UnlockBufHdr(buf);
 			elog(ERROR, no unpinned buffers available);
 		}
-		UnlockBufHdr(buf);
 	}
 
 	/* not reached */
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Ants Aasma
On Wed, Jan 4, 2012 at 3:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On 12/29/11, Ants Aasma ants.aa...@eesti.ee wrote:
 Unless I'm missing something, double-writes are needed for all writes,
 not only the first page after a checkpoint. Consider this sequence of
 events:

 1. Checkpoint
 2. Double-write of page A (DW buffer write, sync, heap write)
 3. Sync of heap, releasing DW buffer for new writes.
  ... some time goes by
 4. Regular write of page A
 5. OS writes one part of page A
 6. Crash!

 Now recovery comes along, page A is broken in the heap with no
 double-write buffer backup nor anything to recover it by in the WAL.

 Isn't 3 the very definition of a checkpoint, meaning that 4 is not
 really a regular write as it is the first one after a checkpoint?

 I think you nailed it.

No, I should have explicitly stated that no checkpoint happens in
between. I think the confusion here is because I assumed Kevin
described a fixed size d-w buffer in this message:

On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 ...  The file is fsync'd (like I said,
 hopefully to BBU cache), then each page in the double-write buffer is
 written to the normal page location, and that is fsync'd.  Once that
 is done, the database writes have no risk of being torn, and the
 double-write buffer is marked as empty.  ...

If the double-write buffer survives until the next checkpoint,
double-writing only the first write should work just fine. The
advantage over current full-page writes is that the write is not into
the WAL stream and is done (hopefully) by the bgwriter/checkpointer in
the background.

--
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Nicolas Barbier
2012/1/4 Simon Riggs si...@2ndquadrant.com:

 On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund and...@anarazel.de wrote:

 I wonder if CRC32c wouldn't be a good alternative given more and more cpus
 (its in SSE 4.2) support calculating it in silicon.

 We're trying to get something that fits in 16bits for this release.
 I'm guessing CRC32c doesn't?

What happens to the problem-detecting performance of a 16 bit part of
a CRC32c vs. a real 16 bit checksum? If it is still as good, it might
make sense to use the former, assuming that there is a way to easily
trigger the silicon support and enough CPUs support it.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania
I have installed RAMdisk and pointed the parameter:

#stats_temp_directory = 'B:\pg_stat_tmp'
I also tried #stats_temp_directory = 'B:/pg_stat_tmp'

But, still there is no file created in the RAM disk.
The previous stat file is touched even after the change is made. (I have
restarted the service after effecting the change)

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5119436.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread Tomas Vondra
On 4 Leden 2012, 13:17, pratikchirania wrote:
 I have installed RAMdisk and pointed the parameter:

 #stats_temp_directory = 'B:\pg_stat_tmp'
 I also tried #stats_temp_directory = 'B:/pg_stat_tmp'

 But, still there is no file created in the RAM disk.
 The previous stat file is touched even after the change is made. (I have
 restarted the service after effecting the change)

You have to remove the '#' at the beginning, this way it's commented out.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Simon Riggs  wrote:
 
 We can either
 
 (1) report all errors on a page, including errors that don't change
 PostgreSQL data. This involves checksumming long strings of zeroes,
 which the checksum algorithm can't tell apart from long strings of
 ones.
 
What do you mean?  Each byte which goes into the checksum, and the
position of that byte influences the outcome once you've got a
non-zero value in sum1.  The number of leading NIL bytes would not
affect the outcome unless you seed the calculation with something
non-zero, but including the page header in the calculation seems to
cover that OK.
 
 (2) report only errors that changed PostgreSQL data.
 
 We already do (1) for WAL CRCs so doing the same thing for page
 checksums makes sense and is much faster.
 
 If enough people think we should do (2) then its a simple change to
 the patch.
 
To me, (1) makes more sense, but it seems to me you're currently
doing (2) because you check in three parts, skipping the free space
in the middle of the page.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of mié ene 04 04:12:43 -0300 2012:
 Kevin Grittner  wrote:
  
  if we define sum1 and sum2 as uint I don't see how we can get an
  overflow with 8k byes
  
 I feel the need to amend that opinion.
  
 While sum1 only needs to hold a maximum of (BLCKSZ * 255), which
 would be adequate for a BLCKSZ up to 16 MB, sum2 needs to hold up to
 a maximum of ((BLCKSZ * (BLCKSZ + 1) / 2) * 255) so a 32-bit unsigned
 isn't even good for an 8 KB block size.  A uint64 for sum2 can handle
 BLCKSZ up to 256 MB.  So we could define sum1 as uint (which is
 presumably either uint32 or uint64) and sum2 as uint64 and for a
 BLCKSZ up toe 16 MB we still don't need to find the modulo 255
 numbers until we're done adding things up.
  
 Does anyone think we need to support anything bigger than that? 
 Making them both uint64 would support a BLCKSZ up to 256 MB without
 needing to do the division inside the loop. but it might be slightly
 slower on 32-bit builds.

We don't support BLCKSZ higher than 32k anyway ... saith pg_config.h:

/* Size of a disk block --- this also limits the size of a tuple. You can set
   it bigger if you need bigger tuples (although TOAST should reduce the need
   to have large tuples, since fields can be spread across multiple tuples).
   BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ is
   currently 2^15 (32768). This is determined by the 15-bit widths of the
   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h).
   Changing BLCKSZ requires an initdb. */
#define BLCKSZ 8192

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Robert Haas  wrote:
 Jeff Janes  wrote:
 
 But it doesn't seem safe to me replace a page from the DW buffer
 and then apply WAL to that replaced page which preceded the age of
 the page in the buffer.

 That's what LSNs are for.
 
Agreed.
 
 If we write the page to the checkpoint buffer just once per
 checkpoint, recovery can restore the double-written versions of the
 pages and then begin WAL replay, which will restore all the
 subsequent changes made to the page. Recovery may also need to do
 additional double-writes if it encounters pages that for which we
 wrote WAL but never flushed the buffer, because a crash during
 recovery can also create torn pages.
 
That's a good point.  I think WAL application does need to use
double-write.  As usual, it doesn't affect *when* a page must be
written, but *how*.
 
 When we reach a restartpoint, we fsync everything down to disk and
 then nuke the double-write buffer.
 
I think we add to the double-write buffer as we write pages from the
buffer to disk.  I don't think it makes sense to do potentially
repeated writes of the same page with different contents to the
double-write buffer as we go; nor is it a good idea to leave the page
unsynced and let the double-write buffer grow for a long time.
 
 Similarly, in normal running, we can nuke the double-write buffer
 at checkpoint time, once the fsyncs are complete.
 
Well, we should nuke it for re-use as soon as all pages in the buffer
are written and fsynced.  I'm not at all sure that the best
performance is hit by waiting for checkpoint for that versus doing it
at page eviction time.
 
The whole reason that double-write techniques don't double the write
time is that it is relatively small and the multiple writes to the
same disk sectors get absorbed by the BBU write-back without actually
hitting the disk all the time.  Letting the double-write buffer grow
to a large size seems likely to me to be a performance killer.  The
whole double-write, including fsyncs to buffer and the actual page
location should just be considered part of the page write process, I
think.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Stephen Frost
Simon, all,

* Simon Riggs (si...@2ndquadrant.com) wrote:
 (1) report all errors on a page, including errors that don't change
 PostgreSQL data. This involves checksumming long strings of zeroes,
 which the checksum algorithm can't tell apart from long strings of
 ones.

Do we actually know when/where it's supposed to be all zeros, and hence
could we check for that explicitly?  If we know what it's supposed to
be, in order to be consistent with other information, I could certainly
see value in actually checking that.

I don't think that's valuable enough to go breaking abstraction layers
or bending over backwards to do it though.  If we don't have the
knowledge, at the right level, that the data should all be zeros then
including those pieces in the CRC certainly makes sense to me.

Just my 2c.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Simon Riggs  wrote:
 
 My focus was on getting something working first, then tuning. If
 we're agreed that we have everything apart from the tuning then we
 can proceed with tests to see which works better.
 
Sure.  I just think you are there already except for what I got into.
 
FWIW, moving the modulus application out of the loop is a very
trivial change and has no affect on the results; it's strictly a
performance issue.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Alvaro Herrera  wrote:
 
 We don't support BLCKSZ higher than 32k anyway
 
Thanks for pointing that out.  Then I think we should declare sum1 to
be uint and sum2 to be uint64.  We can take out the % 255 out from
where it sits in the v2 patch, and just add something like this after
the sums are generated:
 
sum1 %= 255;
sum2 %= 255;
 
Or, of course, we can just do it on the line where we combine the two
sums for the final result, if that's not too hard to read.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-01-04 Thread Albe Laurenz
Pavel Stehule wrote:
 here is new version of CHECK FUNCTION patch
 
 I changed implementation of interface:
 
 * checked functions returns table instead raising exceptions - it
 necessary for describing more issues inside one function - and it
 allow to use better structured data then ExceptionData

[...]

 * result of CHECK FUNCTION is simple table (like EXPLAIN - based on
 Tom proposition)

I don't have the time for a complete review, but I tried the patch
and found:

It is in context diff and applies to current master (there is fuzz 1
in one hunk). It contains documentation and regression tests.
Compiles without warnings and passes regression tests.

The two or three CHECK FUNCTIONs I ran worked ok.

The documentation (that I wrote) will need to get updated: currently
it states in two places that the checker function should throw a
warning if it encounters a problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2012-01-04 Thread Aidan Van Dyk
On Tue, Jan 3, 2012 at 9:22 PM, Robert Haas robertmh...@gmail.com wrote:

 It seems to me that if you are happy with #2, you don't really need to
 enable sync rep in the first place.

 At any rate, even without multiple component failures, this
 configuration makes it pretty easy to lose durability (which is the
 only point of having sync rep in the first place).  Suppose the NIC
 card on the master is the failing component.  If it happens to drop
 the TCP connection to the clients just before it drops the connection
 to the standby, the standby will have all the transactions, and you
 can fail over just fine.  If it happens to drop the TCP connection to
 the just before it drops the connection to the clients, the standby
 will not have all the transactions, and failover will lose some
 transactions - and presumably you enabled this feature in the first
 place precisely to prevent that sort of occurrence.

 I do think that it might be useful to have this if there's a
 configurable timeout involved - that way, people could say, well, I'm
 OK with maybe losing transactions if the standby has been gone for X
 seconds.  But if the only possible behavior is equivalent to a
 zero-second timeout I don't think it's too useful.  It's basically
 just going to lead people to believe that their data is more secure
 than it really is, which IMHO is not helpful.

So, I'm a big fan of syncrep guaranteeing it's guarantees.  To me,
that's the whole point.  Having it fall out of sync rep at any point
*automatically* seems to be exactly counter to the point of sync rep.

That said, I'm also a big fan of monitoring everything as well as I could...

I'ld love a hook script that was run if sync-rep state ever changed
(heck, I'ld even like it if it just choose a new sync standby).

Even better, is there a way we could start injecting notify events
into the cluster on these types of changes?  Especially now that
notify events can take payloads, it means I don't have to keep
constantly polling the database to see if it things its connected,
etc.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-04 Thread Peter Geoghegan
On 4 January 2012 07:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think SetBufferCommitInfoNeedsSave() needs the same treatment as
 MarkBufferDirty(). And it would probably be good to only set the latch if
 the buffer wasn't dirty already. Setting a latch that's already set is fast,
 but surely it's even faster to not even try.

That seems reasonable. Revised patch is attached.

 Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit
 worried about the impact on systems with a lot of CPUs. If you have a lot of
 CPUs writing to the same cache line that contains the latch's flag, that
 might get expensive.

Also reasonable, but I don't think that I'll get around to it until
after the final commitfest deadline.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
new file mode 100644
index e3ae92d..0fcaf01
*** a/src/backend/bootstrap/bootstrap.c
--- b/src/backend/bootstrap/bootstrap.c
*** AuxiliaryProcessMain(int argc, char *arg
*** 416,421 
--- 416,422 
  
  		case BgWriterProcess:
  			/* don't set signals, bgwriter has its own agenda */
+ 			ProcGlobal-bgwriterLatch = MyProc-procLatch; /* Expose */
  			BackgroundWriterMain();
  			proc_exit(1);		/* should never return */
  
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index fc1a579..a5248ba
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***
*** 50,55 
--- 50,56 
  #include miscadmin.h
  #include postmaster/postmaster.h
  #include storage/latch.h
+ #include storage/proc.h
  #include storage/shmem.h
  
  /* Are we currently in WaitLatch? The signal handler would like to know. */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
new file mode 100644
index 1f8d2d6..59e5180
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***
*** 51,56 
--- 51,58 
  #include storage/ipc.h
  #include storage/lwlock.h
  #include storage/pmsignal.h
+ #include storage/proc.h
+ #include storage/procsignal.h
  #include storage/shmem.h
  #include storage/smgr.h
  #include storage/spin.h
*** static volatile sig_atomic_t shutdown_re
*** 73,78 
--- 75,82 
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			1
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
*** BackgroundWriterMain(void)
*** 123,129 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 127,133 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
*** BackgroundWriterMain(void)
*** 238,278 
  	for (;;)
  	{
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		if (got_SIGHUP)
  		{
! 			got_SIGHUP = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			/* update global shmem state for sync rep */
  		}
! 		if (shutdown_requested)
  		{
  			/*
! 			 * From here on, elog(ERROR) should end with exit(1), not send
! 			 * control back to the sigsetjmp block above
  			 */
! 			ExitOnAnyError = true;
! 			/* Normal exit from the bgwriter is here */
! 			proc_exit(0);		/* done */
! 		}
  
! 		/*
! 		 * Do one cycle of dirty-buffer writing.
! 		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
  BgWriterNap(void)
--- 242,337 
  	for (;;)
  	{
  		/*
! 		 * Do one cycle of dirty-buffer writing, potentially hibernating if
! 		 * there have been no buffers to write.
  		 */
! 		if (!BgBufferSync())
  		{
! 			/* Clock sweep was not lapped - nap for the configured time. */
! 			BgWriterNap();
  		}
! 		else
  		{
+ 			int res = 0;
  			/*
! 			 * Initiate hibernation by arming the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns 

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs
On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost sfr...@snowman.net wrote:
 Simon, all,

 * Simon Riggs (si...@2ndquadrant.com) wrote:
 (1) report all errors on a page, including errors that don't change
 PostgreSQL data. This involves checksumming long strings of zeroes,
 which the checksum algorithm can't tell apart from long strings of
 ones.

 Do we actually know when/where it's supposed to be all zeros, and hence
 could we check for that explicitly?  If we know what it's supposed to
 be, in order to be consistent with other information, I could certainly
 see value in actually checking that.

Yes, we can. Excellent suggestion, will implement.

That means we can keep the CRC calc fast as well as check the whole of
the page inbound.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Amit Kapila
Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
same information in pg_control file?

This may be required in case we need information about more than one
checkpoint as pg_control can hold information of only recent checkpoint. But
I could not think of a case where more than one checkpoint information will
be required.

Could anybody let me know the cases where it is required.  

 



Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Heikki Linnakangas

On 04.01.2012 08:42, Amit Kapila wrote:

Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
same information in pg_control file?


I guess it wouldn't be strictly necessary...


This may be required in case we need information about more than one
checkpoint as pg_control can hold information of only recent checkpoint. But
I could not think of a case where more than one checkpoint information will
be required.


There is a pointer in the control file to the previous checkpoint, too. 
It's not normally needed, but we fall back to that if the latest 
checkpoint cannot be read for some reason, like disk failure. If you 
have a disk failure and cannot read the latest checkpoint, chances are 
that you have a corrupt database anyway, but at least we try to recover 
as much as we can using the previous checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Simon Riggs
On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 04.01.2012 08:42, Amit Kapila wrote:

 Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
 same information in pg_control file?


 I guess it wouldn't be strictly necessary...

Apart from replicated standbys, which need that info for running restartpoints.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 12:47 AM, David E. Wheeler wrote:
 Oy, this doesn’t look good:
 $ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
 The connection to the server was lost. Attempting reset: Succeeded.

 Try
 elog(NOTICE, $^V)

Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
means that either Perl passed something that's not an SV to a function
declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
way, Perl is failing to satisfy the POLA if you ask me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler
On Jan 4, 2012, at 12:44 AM, Andrew Dunstan wrote:

 Try
 
   elog(NOTICE, $^V)

Yeah, I used

elog(NOTICE, version-new($^V))

Which was fine. But still, it should’t segfault.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler
On Jan 4, 2012, at 8:15 AM, Tom Lane wrote:

 Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
 means that either Perl passed something that's not an SV to a function
 declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
 way, Perl is failing to satisfy the POLA if you ask me.

Possibly, though I know nothing of Perl’s internals. Someone able to come up 
with a simple test case?

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 11:15 AM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 01/04/2012 12:47 AM, David E. Wheeler wrote:

Oy, this doesn’t look good:
$ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
The connection to the server was lost. Attempting reset: Succeeded.

Try
 elog(NOTICE, $^V)

Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
means that either Perl passed something that's not an SV to a function
declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
way, Perl is failing to satisfy the POLA if you ask me.




U, not sure.

The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a 
scalar) but some other sort of animal:



$^V The revision, version, and subversion of the Perl interpreter,
represented as a version object.

This variable first appeared in perl 5.6.0; earlier versions of
perl will see an undefined value. Before perl 5.10.0 $^V was
represented as a v-string.

$^V can be used to determine whether the Perl interpreter
executing a script is in the right range of versions. For
example:

warn Hashes not randomized!\n if !$^V or $^V lt v5.8.1

To convert $^V into its string representation use sprintf()'s
%vd conversion:

printf version is v%vd\n, $^V; # Perl's version




But Util.xs::util_elog() expects an SV and doesn't check whether or not 
it actually has one. I've found a few other ways of crashing this call 
(e.g. by passing a typeglob), so maybe we need to test that we actually 
have an SV. I think SvOK() is what we'd use for that - perl gurus please 
confirm.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a 
 scalar) but some other sort of animal:

Yeah, it's a version object, but I'd have thought that SvPV and friends
would automatically stringify such an object.  Otherwise, practically
any kind of perl extension could be crashed by passing it one, no?

 But Util.xs::util_elog() expects an SV and doesn't check whether or not 
 it actually has one. I've found a few other ways of crashing this call 
 (e.g. by passing a typeglob), so maybe we need to test that we actually 
 have an SV. I think SvOK() is what we'd use for that - perl gurus please 
 confirm.

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 11:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 04.01.2012 08:42, Amit Kapila wrote:

 Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
 same information in pg_control file?


 I guess it wouldn't be strictly necessary...

 Apart from replicated standbys, which need that info for running 
 restartpoints.

Yeah.

But, the OP makes me wonder: why can a standby only perform a
restartpoint where the master performed a checkpoint?  It seems like a
standby ought to be able to create a restartpoint anywhere, just by
writing everything, flushing it to disk, and update pg_control.  I
assume there's some reason that doesn't work, I just don't know what
it is...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 9:28 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 I'ld love a hook script that was run if sync-rep state ever changed
 (heck, I'ld even like it if it just choose a new sync standby).

That seems useful.  I don't think the current code quite knows its own
state; we seem to have each walsender recompute who the boss is, and
if you query pg_stat_replication that redoes the same calculation.  I
can't shake the feeling that there's a better way... which would also
facilitate this.

 Even better, is there a way we could start injecting notify events
 into the cluster on these types of changes?  Especially now that
 notify events can take payloads, it means I don't have to keep
 constantly polling the database to see if it things its connected,
 etc.

I like this idea, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane
In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php
it's pointed out that recent versions of pg_restore fall over on
archives made with -Fc --inserts (or --column-inserts), but only when
restoring direct to database; if you ask for text output it's perfectly
fine.  Investigation shows that the problem is that individual INSERT
commands are being broken apart at arbitrary buffer boundaries --- you
don't see any problem in text output, but when the bufferloads are
submitted as separate PQexec calls, of course bad things happen.

I believe this worked okay up until my patch here:
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=6545a901a
which removed the mini SQL lexer in pg_backup_db.c.  I had supposed that
that had no useful function except to separate COPY data from
not-COPY-data, but in reality it had another function of ensuring that
INSERT commands split across zlib bufferload boundaries would get
reassembled before they are submitted to PQexec.

Not entirely sure what to do about this.  We could consider reverting
the aforesaid patch and trying to find another way of fixing that code's
failure to cope with standard-conforming strings, but I'm not sure that
there's a good way to know whether standard_conforming_strings is active
here, and anyway that code was ugly as sin and I'd rather not resurrect
it.  But on the other hand, there are no clear line boundaries in the
compressed data, and we can't introduce any without (a) worsening
compression and (b) breaking compatibility with existing dump files.

Anybody have any bright ideas?  I'm fresh out at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 8:31 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 When we reach a restartpoint, we fsync everything down to disk and
 then nuke the double-write buffer.

 I think we add to the double-write buffer as we write pages from the
 buffer to disk.  I don't think it makes sense to do potentially
 repeated writes of the same page with different contents to the
 double-write buffer as we go; nor is it a good idea to leave the page
 unsynced and let the double-write buffer grow for a long time.

You may be right.  Currently, though, we only fsync() at
end-of-checkpoint.  So we'd have to think about what to fsync, and how
often, to keep the double-write buffer to a manageable size.  I can't
help thinking that any extra fsyncs are pretty expensive, though,
especially if you have to fsync() every file that's been
double-written before clearing the buffer.  Possibly we could have 2^N
separate buffers based on an N-bit hash of the relfilenode and segment
number, so that we could just fsync 1/(2^N)-th of the open files at a
time.  But even that sounds expensive: writing back lots of dirty data
isn't cheap.  One of the systems I've been doing performance testing
on can sometimes take 15 seconds to write a shutdown checkpoint, and
I'm sure that other people have similar (and worse) problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] grants vs. inherited tables

2012-01-04 Thread Robert Haas
On Fri, Dec 30, 2011 at 4:25 AM, Marko Kreen mark...@gmail.com wrote:
 I have the (hopefully wrong) impression that you're missing the fact
 that it already exists, at least in 9.0.

 You are right, I missed it.  For quite obvious reason:

  $ grep -ri aclexplode doc/
  $

 Is there a good reason why it's undocumented?  Internal/unstable API?
 I better avoid it then.  But I would like to have this or similar
 function as part of public API.

I don't see any real reason why we couldn't document this one.  It
returns OIDs, but that's the name of the game if you're doing anything
non-trivial with PostgreSQL system catalogs.  Off-hand I'm not quite
sure which section of the documentation would be appropriate, though.
It looks like the functions we provide are mostly documented in
chapter 9, Functions and Operators.  Section 9.23 on System
Information Functions seems like it's probably the closest fit...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 we only fsync() at end-of-checkpoint.  So we'd have to think about
 what to fsync, and how often, to keep the double-write buffer to a
 manageable size.
 
I think this is the big tuning challenge with this technology.
 
 I can't help thinking that any extra fsyncs are pretty expensive,
 though, especially if you have to fsync() every file that's been
 double-written before clearing the buffer. Possibly we could have
 2^N separate buffers based on an N-bit hash of the relfilenode and
 segment number, so that we could just fsync 1/(2^N)-th of the open
 files at a time.
 
I'm not sure I'm following -- we would just be fsyncing those files
we actually wrote pages into, right?  Not all segments for the table
involved?
 
 But even that sounds expensive: writing back lots of dirty data
 isn't cheap.  One of the systems I've been doing performance
 testing on can sometimes take 15 seconds to write a shutdown
 checkpoint,
 
Consider the relation-file fsyncs for double-write as a form of
checkpoint spreading, and maybe it won't seem so bad.  It should
make that shutdown checkpoint less painful.  Now, I have been
thinking that on a write-heavy system you had better have a BBU
write-back cache, but that's my recommendation, anyway.
 
 and I'm sure that other people have similar (and worse) problems.
 
Well, I have no doubt that this feature should be optional.  Those
who prefer can continue to do full-page writes to the WAL, instead. 
Or take the running with scissors approach.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Robert Haas
On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 Yes, I know that these only appeared in GCC 4.6+ and as such are a
 relatively recent phenomenon, but there has been some effort to
 eliminate them, and if I could get a non-hacked -Werror build I'd feel
 happy enough about excluding them as already outlined.

 I just do this:
 echo COPT=-Werror  src/Makefile.custom
 ...which seems to work reasonably well.

 I see no point in -Werror whatsoever.  If you aren't examining the make
 output for warnings, you're not following proper development practice
 IMO.

I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...

 gcc is not the only tool we use in the build process, so if you
 are relying on -Werror to call attention to everything you should be
 worrying about, you lost already.

Hmm, I guess.  I've never had a problem with anything else that I can
remember, though.

 I'm also less than thrilled with the idea that whatever the gcc boys
 decide to make a warning tomorrow will automatically become a MUST FIX
 NOW for us.

I'm not thrilled about that either.  Especially since they seem to be
adding more and more warnings that are harder and harder to work
around for issues that are less and less important.  Unimportant
warnings that are easily avoidable are not so bad, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Peter Geoghegan
On 4 January 2012 18:44, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm also less than thrilled with the idea that whatever the gcc boys
 decide to make a warning tomorrow will automatically become a MUST FIX
 NOW for us.

 I'm not thrilled about that either.  Especially since they seem to be
 adding more and more warnings that are harder and harder to work
 around for issues that are less and less important.  Unimportant
 warnings that are easily avoidable are not so bad, but...

I'd have a certain amount of sympathy for that view. It took building
with Clang to notice that we incorrectly used one enum rvalue to
assign to a variable of another enum type, which I thought was a
little bit surprising; I'd have expected GCC to catch that one, since
it is pretty likely to be valid.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby

On Dec 23, 2011, at 2:23 PM, Kevin Grittner wrote:

 Jeff Janes jeff.ja...@gmail.com wrote:
 
 Could we get some major OLTP users to post their CLOG for
 analysis?  I wouldn't think there would be much
 security/propietary issues with CLOG data.
 
 FWIW, I got the raw numbers to do my quick check using this Ruby
 script (put together for me by Peter Brant).  If it is of any use to
 anyone else, feel free to use it and/or post any enhanced versions
 of it.

Here's output from our largest OLTP system... not sure exactly how to interpret 
it, so I'm just providing the raw data. This spans almost exactly 1 month.

I have a number of other systems I can profile if anyone's interested.

063A 379
063B 143
063C 94
063D 94
063E 326
063F 113
0640 122
0641 270
0642 81
0643 390
0644 183
0645 76
0646 61
0647 50
0648 275
0649 288
064A 126
064B 53
064C 59
064D 125
064E 357
064F 92
0650 54
0651 83
0652 267
0653 328
0654 118
0655 75
0656 104
0657 280
0658 414
0659 105
065A 74
065B 153
065C 303
065D 63
065E 216
065F 169
0660 113
0661 405
0662 85
0663 52
0664 44
0665 78
0666 412
0667 116
0668 48
0669 61
066A 66
066B 364
066C 104
066D 48
066E 68
066F 104
0670 465
0671 158
0672 64
0673 62
0674 115
0675 452
0676 296
0677 65
0678 80
0679 177
067A 316
067B 86
067C 87
067D 270
067E 84
067F 295
0680 299
0681 88
0682 35
0683 67
0684 66
0685 456
0686 146
0687 52
0688 33
0689 73
068A 147
068B 345
068C 107
068D 67
068E 50
068F 97
0690 473
0691 156
0692 47
0693 57
0694 97
0695 550
0696 224
0697 51
0698 80
0699 280
069A 115
069B 426
069C 241
069D 395
069E 98
069F 130
06A0 523
06A1 296
06A2 92
06A3 97
06A4 122
06A5 524
06A6 256
06A7 118
06A8 111
06A9 157
06AA 553
06AB 166
06AC 106
06AD 103
06AE 200
06AF 621
06B0 288
06B1 95
06B2 107
06B3 227
06B4 92
06B5 447
06B6 210
06B7 364
06B8 119
06B9 113
06BA 384
06BB 319
06BC 45
06BD 68
06BE 2
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Heikki Linnakangas

On 04.01.2012 20:44, Robert Haas wrote:

On Tue, Jan 3, 2012 at 9:23 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Tue, Jan 3, 2012 at 7:39 PM, Peter Geogheganpe...@2ndquadrant.com  wrote:

Yes, I know that these only appeared in GCC 4.6+ and as such are a
relatively recent phenomenon, but there has been some effort to
eliminate them, and if I could get a non-hacked -Werror build I'd feel
happy enough about excluding them as already outlined.



I just do this:
echo COPT=-Werror  src/Makefile.custom
...which seems to work reasonably well.


I see no point in -Werror whatsoever.  If you aren't examining the make
output for warnings, you're not following proper development practice
IMO.


I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...


I use make -s.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 02:35 PM, Heikki Linnakangas wrote:

On 04.01.2012 20:44, Robert Haas wrote:

On Tue, Jan 3, 2012 at 9:23 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:
On Tue, Jan 3, 2012 at 7:39 PM, Peter 
Geogheganpe...@2ndquadrant.com  wrote:

Yes, I know that these only appeared in GCC 4.6+ and as such are a
relatively recent phenomenon, but there has been some effort to
eliminate them, and if I could get a non-hacked -Werror build I'd 
feel

happy enough about excluding them as already outlined.



I just do this:
echo COPT=-Werror  src/Makefile.custom
...which seems to work reasonably well.


I see no point in -Werror whatsoever.  If you aren't examining the make
output for warnings, you're not following proper development practice
IMO.


I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...


I use make -s.


Yeah, that's a good thing to do.

We are by far the most vigilant project I am aware of about fixing 
warnings. That's a Good Thing (tm,). Build most FOSS software and you 
see huge numbers of warnings fly by. It can get quite distressing.


We turn the errors off for gram.o precisely because we can't control it, 
since the included source file scan.c is generated by flex.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Jim Nasby j...@nasby.net wrote:
 
 Here's output from our largest OLTP system... not sure exactly how
 to interpret it, so I'm just providing the raw data. This spans
 almost exactly 1 month.
 
Those number wind up meaning that 18% of the 256-byte blocks (1024
transactions each) were all commits.  Yikes.  That pretty much
shoots down Robert's idea of summarized CLOG data, I think.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 1:32 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 we only fsync() at end-of-checkpoint.  So we'd have to think about
 what to fsync, and how often, to keep the double-write buffer to a
 manageable size.

 I think this is the big tuning challenge with this technology.

One of them, anyway.  I think it may also be tricky to make sure that
a backend that needs to write a dirty buffer doesn't end up having to
wait for a double-write to be fsync'd.

 I can't help thinking that any extra fsyncs are pretty expensive,
 though, especially if you have to fsync() every file that's been
 double-written before clearing the buffer. Possibly we could have
 2^N separate buffers based on an N-bit hash of the relfilenode and
 segment number, so that we could just fsync 1/(2^N)-th of the open
 files at a time.

 I'm not sure I'm following -- we would just be fsyncing those files
 we actually wrote pages into, right?  Not all segments for the table
 involved?

Yes.

 But even that sounds expensive: writing back lots of dirty data
 isn't cheap.  One of the systems I've been doing performance
 testing on can sometimes take 15 seconds to write a shutdown
 checkpoint,

 Consider the relation-file fsyncs for double-write as a form of
 checkpoint spreading, and maybe it won't seem so bad.  It should
 make that shutdown checkpoint less painful.  Now, I have been
 thinking that on a write-heavy system you had better have a BBU
 write-back cache, but that's my recommendation, anyway.

I think this point has possibly been beaten to death, but at the risk
of belaboring the point I'll bring it up again: the frequency with
which we fsync() is basically a trade-off between latency and
throughput.  If you fsync a lot, then each one will be small, so you
shouldn't experience much latency, but throughput might suck.  If you
don't fsync very much, then you maximize the chances for
write-combining (because inserting an fsync between two writes to the
same block forces that block to be physically written twice rather
than just once) thus improving throughput, but when you do get around
to calling fsync() there may be a lot of data to write all at once,
and you may get a gigantic latency spike.

As far as I can tell, one fsync per checkpoint is the theoretical
minimum, and that's what we do now.  So our current system is
optimized for throughput.  The decision to put full-page images into
WAL rather than a separate buffer is essentially turning the dial in
the same direction, because, in effect, the double-write fsync
piggybacks on the WAL fsync which we must do anyway.  So both the
decision to use a double-write buffer AT ALL and the decision to fsync
more frequently to keep that buffer to a manageable size are going to
result in turning that dial in the opposite direction.  It seems to me
inevitable that, even with the best possible implementation,
throughput will get worse.  With a good implementation but not a bad
one, latency should improve.

Now, this is not necessarily a reason to reject the idea.  I believe
that several people have proposed that our current implementation is
*overly* optimized for throughput *at the expense of* latency, and
that we might want to provide some options that, in one way or
another, fsync more frequently, so that checkpoint spikes aren't as
bad.  But when it comes time to benchmark, we might need to think
somewhat carefully about what we're testing...

Another thought here is that double-writes may not be the best
solution, and are almost certainly not the easiest-to-implement
solution.  We could instead do something like this: when an unlogged
change is made to a buffer (e.g. a hint bit is set), we set a flag on
the buffer header.  When we evict such a buffer, we emit a WAL record
that just overwrites the whole buffer with a new FPI.  There are some
pretty obvious usage patterns where this is likely to be painful (e.g.
load a big table without setting hint bits, and then seq-scan it).
But there are also many use cases where the working set fits inside
shared buffers and data pages don't get written very often, apart from
checkpoint time, and those cases might work just fine.  Also, the
cases that are problems for this implementation are likely to also be
problems for a double-write based implementation, for exactly the same
reasons: if you discover at buffer eviction time that you need to
fsync something (whether it's WAL or DW), it's going to hurt.
Checksums aren't free even when using double-writes: if you don't have
checksums, pages that have only hint bit-changes don't need to be
double-written.  If double writes aren't going to give us anything
for free, maybe that's not the right place to be focusing our
efforts...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 12:56 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a
scalar) but some other sort of animal:

Yeah, it's a version object, but I'd have thought that SvPV and friends
would automatically stringify such an object.  Otherwise, practically
any kind of perl extension could be crashed by passing it one, no?


But Util.xs::util_elog() expects an SV and doesn't check whether or not
it actually has one. I've found a few other ways of crashing this call
(e.g. by passing a typeglob), so maybe we need to test that we actually
have an SV. I think SvOK() is what we'd use for that - perl gurus please
confirm.

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)





I tested it and you're right, it doesn't help. I don't see what else we 
can do about it. There doesn't appear to be any test for an SV in the API.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-04 Thread Noah Misch
Git master can already avoid rewriting the table for column type changes like
varchar(10) - varchar(20).  However, if the column in question is on either
side of a FK relationship, we always revalidate the foreign key.  Concretely,
I wanted these no-rewrite type changes to also assume FK validity:
- Any typmod-only change
- text - varchar
- domain - base type

To achieve that, this patch skips the revalidation when two conditions hold:

(a) Old and new pg_constraint.conpfeqop match exactly.  This is actually
stronger than needed; we could loosen things by way of operator families.
However, no core type would benefit, and my head exploded when I tried to
define the more-generous test correctly.

(b) The functions, if any, implementing a cast from the foreign type to the
primary opcintype are the same.  For this purpose, we can consider a binary
coercion equivalent to an exact type match.  When the opcintype is
polymorphic, require that the old and new foreign types match exactly.  (Since
ri_triggers.c does use the executor, the stronger check for polymorphic types
is no mere future-proofing.  However, no core type exercises its necessity.)

These follow from the rules used to decide when to rebuild an index.  I
further justify them in source comments.

To implement this, I have ATPostAlterTypeParse() stash the content of the old
pg_constraint.conpfeqop in the Constraint node.  ATAddForeignKeyConstraint()
notices that and evaluates the above rules.  If they both pass, it omits the
validation step just as though skip_validation had been given.

Thanks,
nm
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b52415..0cde503 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 276,281  static Oid transformFkeyCheckAttrs(Relation pkrel,
--- 276,282 
int numattrs, int16 *attnums,
Oid *opclasses);
  static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts);
+ static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid 
*funcid);
  static void validateCheckConstraint(Relation rel, HeapTuple constrtup);
  static void validateForeignKeyConstraint(char *conname,
 Relation rel, Relation 
pkrel,
***
*** 357,362  static void ATPostAlterTypeCleanup(List **wqueue, 
AlteredTableInfo *tab, LOCKMOD
--- 358,364 
  static void ATPostAlterTypeParse(Oid oldId, char *cmd,
 List **wqueue, LOCKMODE lockmode, bool 
rewrite);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
+ static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
 Oid oldOwnerId, Oid 
newOwnerId);
  static void change_owner_recurse_to_sequences(Oid relationOid,
***
*** 5696,5701  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5698,5705 
numpks;
Oid indexOid;
Oid constrOid;
+   boolold_check_ok;
+   ListCell   *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop);
  
/*
 * Grab an exclusive lock on the pk table, so that someone doesn't 
delete
***
*** 5812,5817  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5816,5828 
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 errmsg(number of referencing and referenced 
columns for foreign key disagree)));
  
+   /*
+* On the strength of a previous constraint, we might avoid scanning
+* tables to validate this one.  See below.
+*/
+   old_check_ok = (fkconstraint-old_conpfeqop != NIL);
+   Assert(!old_check_ok || numfks == 
list_length(fkconstraint-old_conpfeqop));
+ 
for (i = 0; i  numpks; i++)
{
Oid pktype = pktypoid[i];
***
*** 5826,5831  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5837,5843 
Oid ppeqop;
Oid ffeqop;
int16   eqstrategy;
+   Oid pfeqop_right;
  
/* We need several fields out of the pg_opclass entry */
cla_ht = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclasses[i]));
***
*** 5868,5877  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
ffeqop = 

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Alex Hunsaker
On Wed, Jan 4, 2012 at 13:13, Andrew Dunstan and...@dunslane.net wrote:


 On 01/04/2012 12:56 PM, Tom Lane wrote:

 I looked at that last night but it appeared that SvOK would be perfectly
 happy.  (Didn't actually try it, though, I was just eyeballing the flags
 in gdb.)


 I tested it and you're right, it doesn't help. I don't see what else we can
 do about it. There doesn't appear to be any test for an SV in the API.

I think about the best we can do is something along the lines of:

sv2cstr()
{
...
  if (Perl_vverify(sv))
   return utf_u2e(SvPV(sv));
...
}

I dont the the utf_u2e is strictly needed (other than that it strdups)
as I don't think versions can have utf8 chars, or at least that $^V
will not have utf8 chars (and even if it did it would only cause
problems if they had codepoints in 128  255).

We would still have issues with typeglobs...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 3:02 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jim Nasby j...@nasby.net wrote:
 Here's output from our largest OLTP system... not sure exactly how
 to interpret it, so I'm just providing the raw data. This spans
 almost exactly 1 month.

 Those number wind up meaning that 18% of the 256-byte blocks (1024
 transactions each) were all commits.  Yikes.  That pretty much
 shoots down Robert's idea of summarized CLOG data, I think.

I'm not *totally* certain of that... another way to look at it is that
I have to be able to show a win even if only 18% of the probes into
the summarized data are successful, which doesn't seem totally out of
the question given how cheap I think lookups could be.  But I'll admit
it's not real encouraging.

I think the first thing we need to look at is increasing the number of
CLOG buffers.  Even if hypothetical summarized CLOG data had a 60% hit
rate rather than 18%, 8 CLOG buffers is probably still not going to be
enough for a 32-core system, let alone anything larger.  I am aware of
two concerns here:

1. Unconditionally adding more CLOG buffers will increase PostgreSQL's
minimum memory footprint, which is bad for people suffering under
default shared memory limits or running a database on a device with
less memory than a low-end cell phone.

2. The CLOG code isn't designed to manage a large number of buffers,
so adding more might cause a performance regression on small systems.

On Nate Boley's 32-core system, running pgbench at scale factor 100,
the optimal number of buffers seems to be around 32.  I'd like to get
some test results from smaller systems - any chance you (or anyone)
have, say, an 8-core box you could test on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-01-04 Thread Robert Haas
On Tue, Jan 3, 2012 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 6:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 [ reasons ]

 I agree with these reasons.  We don't get charged $0.50 per GUC, so
 there's no particular reason to contort things to have fewer of them.

 Well, there definitely is a distributed cost to each additional GUC.
 Peter's given what are probably adequate reasons to add several of them
 here, but that doesn't mean we should not ask the question whether each
 new GUC is really necessary.

No argument.  I'm merely saying that I think the rationale for these
GUCs is solid enough to justify their existence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Paul Ramsey
Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher:

[12:03pm] RhodiumToad: what happens is this
[12:04pm] RhodiumToad: postquel_start know this statement doesn't
return the result, so it supplies None_Receiver as the dest-receiver
for the query
[12:04pm] RhodiumToad: however, it knows it's a plannedStmt, so it
fires up the full executor to run it
[12:05pm] RhodiumToad: and the executor allocates a new destreceiver
in its own memory context, replaces es-qd-dest with it,
[12:05pm] RhodiumToad: (the new destreceiver is the one that writes
tuples to the created table)
[12:06pm] RhodiumToad: then at executorEnd (called from postquel_end),
executor shutdown closes the new rel, _and then frees the executor's
memory context, including the destreceiver it created
[12:07pm] RhodiumToad: postquel_end doesn't know that its setting of
-dest was clobbered, so it goes to try and destroy it again, and gets
garbage (if assertions are on)
[12:07pm] RhodiumToad: if assertions weren't on, then the rDestroy
call is harmless
[12:07pm] RhodiumToad: well, mostly harmless
[12:07pm] RhodiumToad: sneaky one, that
[12:09pm] RhodiumToad: you can confirm it by tracing through that
second call to postquel_end and confirming that it's the call to
ExecutorEnd that stomps the content of qd-dest
[12:12pm] pramsey: confirmed, the pass through ExecutorEnd has
clobbered the value so there's garbage when it arrives at line 638
[12:14pm] RhodiumToad: if you trace through ExecutorEnd itself, it
should be the FreeExecutorState that does it
[12:15pm] RhodiumToad: wonder how far back this bug goes
[12:16pm] RhodiumToad: actually not very far
[12:17pm] RhodiumToad: older versions just figured that qd-dest was
always None_Receiver and therefore did not need an rDestroy call
[12:17pm] RhodiumToad: (which is a no-op for None_Receiver)
[12:17pm] pramsey: kills my 8.4
[12:17pm] RhodiumToad: so this is broken in 8.4+
[12:17pm] pramsey: ah
[12:18pm] RhodiumToad: 8.4 introduced the lazy-eval of selects in sql functions
[12:19pm] RhodiumToad: prior to that they were always run immediately
to completion
[12:19pm] RhodiumToad: that requires juggling the destreceiver a bit,
hence the bug
[12:20pm] RhodiumToad: btw, the first statement of the function
shouldn't be needed
[12:21pm] RhodiumToad: just  ... as $f$ create table foo as select 1
as x; $f$;  should be enough to break it
[12:31pm] RhodiumToad: there's no trivial fix


On Wed, Jan 4, 2012 at 11:32 AM, Paul Ramsey pram...@cleverelephant.ca wrote:
 One extra detail, my PostgreSQL is compiled with --enable-cassert.
 This is required to set off the killer function.

 On Wed, Jan 04, 2012 at 07:17:17PM +, pram...@cleverelephant.ca wrote:
 The following bug has been logged on the website:

 Bug reference:      6379
 Logged by:          Paul Ramsey
 Email address:      pram...@cleverelephant.ca
 PostgreSQL version: 9.1.2
 Operating system:   OSX 10.6.8
 Description:

 CREATE OR REPLACE FUNCTION kill_backend()
 RETURNS VOID
 AS $$
   DROP TABLE if EXISTS foo;
   CREATE TABLE foo AS SELECT * FROM pg_class LIMIT 1;
 $$ LANGUAGE 'sql';

 SELECT kill_backend();


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-04 Thread Noah Misch
I neglected to commit after revising the text of a few comments; use this
version instead.  Thanks.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b52415..9eba8e8 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 276,281  static Oid transformFkeyCheckAttrs(Relation pkrel,
--- 276,282 
int numattrs, int16 *attnums,
Oid *opclasses);
  static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts);
+ static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid 
*funcid);
  static void validateCheckConstraint(Relation rel, HeapTuple constrtup);
  static void validateForeignKeyConstraint(char *conname,
 Relation rel, Relation 
pkrel,
***
*** 357,362  static void ATPostAlterTypeCleanup(List **wqueue, 
AlteredTableInfo *tab, LOCKMOD
--- 358,364 
  static void ATPostAlterTypeParse(Oid oldId, char *cmd,
 List **wqueue, LOCKMODE lockmode, bool 
rewrite);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
+ static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
 Oid oldOwnerId, Oid 
newOwnerId);
  static void change_owner_recurse_to_sequences(Oid relationOid,
***
*** 5696,5701  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5698,5705 
numpks;
Oid indexOid;
Oid constrOid;
+   boolold_check_ok;
+   ListCell   *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop);
  
/*
 * Grab an exclusive lock on the pk table, so that someone doesn't 
delete
***
*** 5812,5817  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5816,5828 
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 errmsg(number of referencing and referenced 
columns for foreign key disagree)));
  
+   /*
+* On the strength of a previous constraint, we might avoid scanning
+* tables to validate this one.  See below.
+*/
+   old_check_ok = (fkconstraint-old_conpfeqop != NIL);
+   Assert(!old_check_ok || numfks == 
list_length(fkconstraint-old_conpfeqop));
+ 
for (i = 0; i  numpks; i++)
{
Oid pktype = pktypoid[i];
***
*** 5826,5831  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5837,5843 
Oid ppeqop;
Oid ffeqop;
int16   eqstrategy;
+   Oid pfeqop_right;
  
/* We need several fields out of the pg_opclass entry */
cla_ht = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclasses[i]));
***
*** 5868,5877  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,

 eqstrategy);
else
!   ffeqop = InvalidOid;/* keep compiler quiet */
  
if (!(OidIsValid(pfeqop)  OidIsValid(ffeqop)))
{
--- 5880,5896 
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
+   {
+   pfeqop_right = fktyped;
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,

 eqstrategy);
+   }
else
!   {
!   /* keep compiler quiet */
!   pfeqop_right = InvalidOid;
!   ffeqop = InvalidOid;
!   }
  
if (!(OidIsValid(pfeqop)  OidIsValid(ffeqop)))
{
***
*** 5893,5899  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5912,5921 
target_typeids[1] = opcintype;
if (can_coerce_type(2, input_typeids, target_typeids,

COERCION_IMPLICIT))
+   {

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I think it may also be tricky to make sure that a backend that
 needs to write a dirty buffer doesn't end up having to wait for a
 double-write to be fsync'd.
 
This and other parts of your post seem to ignore the BBU write-back
cache.  Multiple fsyncs of a single page can still be collapsed at
that level to a single actual disk write.  In fact, I rather doubt
this technology will look very good on machines without write-back
caching.  I'm not as sure as you are that this is a net loss in
throughput, either.  When the fsync storm clogs the RAID controller,
even reads stall, so something which more evenly pushes writes to
disk might avoid these non-productive pauses.  I think that could
improve throughput enough to balance or exceed the other effects. 
Maybe.  I agree we need to be careful to craft a good set of
benchmarks here.
 
 Checksums aren't free even when using double-writes: if you don't
 have checksums, pages that have only hint bit-changes don't need
 to be double-written.
 
Agreed.  Checksums aren't expected to be free under any
circumstances.  I'm expecting DW to be slightly faster than FPW in
general, with or without in-page checksums.
 
 If double writes aren't going to give us anything for free,
 maybe that's not the right place to be focusing our
 efforts...
 
I'm not sure why it's not enough that they improve performance over
the alternative.  Making some other feature with obvious overhead
free seems an odd requirement to hang on this.  (Maybe I'm
misunderstanding you on that point?)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 12:56 PM, Tom Lane wrote:
 I looked at that last night but it appeared that SvOK would be perfectly
 happy.  (Didn't actually try it, though, I was just eyeballing the flags
 in gdb.)

 I tested it and you're right, it doesn't help. I don't see what else we 
 can do about it. There doesn't appear to be any test for an SV in the API.

I think what's being passed *is* an SV --- at least, the contents look
reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
this particular kind of SV.  Googling suggests that SvPVutf8 used to
fail on READONLY SVs, of which this is one if I'm reading the flag bits
correctly; but that was supposedly fixed years ago.  I believe we've hit
some other undocumented limitation of that function, which the Perl guys
may or may not acknowledge as a bug once we've tracked it down better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 2. The CLOG code isn't designed to manage a large number of
 buffers, so adding more might cause a performance regression on
 small systems.
 
 On Nate Boley's 32-core system, running pgbench at scale factor
 100, the optimal number of buffers seems to be around 32.  I'd
 like to get some test results from smaller systems - any chance
 you (or anyone) have, say, an 8-core box you could test on?
 
Hmm.  I can think of a lot of 4-core servers I could test on.  (We
have a few poised to go into production where it would be relatively
easy to do benchmarking without distorting factors right now.) 
After that we jump to 16 cores, unless I'm forgetting something. 
These are currently all in production, but some of them are
redundant machines which could be pulled for a few hours here and
there for benchmarks.  If either of those seem worthwhile, please
spec the useful tests so I can capture the right information.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 But, the OP makes me wonder: why can a standby only perform a
 restartpoint where the master performed a checkpoint?  It seems like a
 standby ought to be able to create a restartpoint anywhere, just by
 writing everything, flushing it to disk, and update pg_control.

Perhaps, but then crash restarts would have to accept start pointers
that point at any random place in the WAL.  I like the additional error
checking of verifying that there's a checkpoint recod there.  Also
I think the full-page-write mechanism would no longer protect against
torn pages during replay if you did that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Tom Lane
Paul Ramsey pram...@cleverelephant.ca writes:
 Further notes, from Andrew (RhodiumToad) on IRC about the cause of this 
 crasher:
 [12:31pm] RhodiumToad: there's no trivial fix

IMO the main bug here is that functions.c isn't expecting qd-dest to be
overwritten,  so we could work around it by keeping a separate private
copy of the dest pointer.  However, it would also be fair to ask whether
there's not a cleaner solution.  Perhaps the intoRel stuff should be
saving/restoring the original destreceiver instead of just blindly
overwriting it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 3:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 If double writes aren't going to give us anything for free,
 maybe that's not the right place to be focusing our
 efforts...

 I'm not sure why it's not enough that they improve performance over
 the alternative.  Making some other feature with obvious overhead
 free seems an odd requirement to hang on this.  (Maybe I'm
 misunderstanding you on that point?)

Well, this thread is nominally about checksums, but here we are
talking about double writes, so I thought we were connecting those
features in some way?

Certainly, life is easier if we can develop them completely separately
- but checksums really ought to come with some sort of solution to the
problem of a torn-page with hint bit changes, IMO, and I thought
that's why were thinking so hard about DW just now.

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 But, the OP makes me wonder: why can a standby only perform a
 restartpoint where the master performed a checkpoint?  It seems like a
 standby ought to be able to create a restartpoint anywhere, just by
 writing everything, flushing it to disk, and update pg_control.

 Perhaps, but then crash restarts would have to accept start pointers
 that point at any random place in the WAL.  I like the additional error
 checking of verifying that there's a checkpoint recod there.

I could go either way on that one, but...

 Also
 I think the full-page-write mechanism would no longer protect against
 torn pages during replay if you did that.

...this is a very good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby
On Jan 4, 2012, at 2:02 PM, Kevin Grittner wrote:
 Jim Nasby j...@nasby.net wrote:
 Here's output from our largest OLTP system... not sure exactly how
 to interpret it, so I'm just providing the raw data. This spans
 almost exactly 1 month.
 
 Those number wind up meaning that 18% of the 256-byte blocks (1024
 transactions each) were all commits.  Yikes.  That pretty much
 shoots down Robert's idea of summarized CLOG data, I think.

Here's another data point. This is for a londiste slave of what I posted 
earlier. Note that this slave has no users on it.
054A 654
054B 835
054C 973
054D 1020
054E 1012
054F 1022
0550 284


And these clog files are from Sep 15-30... I believe that's the period when we 
were building this slave, but I'm not 100% certain.

04F0 194
04F1 253
04F2 585
04F3 243
04F4 176
04F5 164
04F6 358
04F7 505
04F8 168
04F9 180
04FA 369
04FB 318
04FC 236
04FD 437
04FE 242
04FF 625
0500 222
0501 139
0502 174
0503 91
0504 546
0505 220
0506 187
0507 151
0508 199
0509 491
050A 232
050B 170
050C 191
050D 414
050E 557
050F 231
0510 173
0511 159
0512 436
0513 789
0514 354
0515 157
0516 187
0517 333
0518 599
0519 483
051A 300
051B 512
051C 713
051D 422
051E 291
051F 596
0520 785
0521 825
0522 484
0523 238
0524 151
0525 190
0526 256
0527 403
0528 551
0529 757
052A 837
052B 418
052C 256
052D 161
052E 254
052F 423
0530 469
0531 757
0532 627
0533 325
0534 224
0535 295
0536 290
0537 352
0538 561
0539 565
053A 833
053B 756
053C 485
053D 276
053E 241
053F 270
0540 334
0541 306
0542 700
0543 821
0544 402
0545 199
0546 226
0547 250
0548 354
0549 587


This is for a slave of that database that does have user activity:

054A 654
054B 835
054C 420
054D 432
054E 852
054F 666
0550 302
0551 243
0552 600
0553 295
0554 617
0555 504
0556 232
0557 304
0558 580
0559 156

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 4:02 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 2. The CLOG code isn't designed to manage a large number of
 buffers, so adding more might cause a performance regression on
 small systems.

 On Nate Boley's 32-core system, running pgbench at scale factor
 100, the optimal number of buffers seems to be around 32.  I'd
 like to get some test results from smaller systems - any chance
 you (or anyone) have, say, an 8-core box you could test on?

 Hmm.  I can think of a lot of 4-core servers I could test on.  (We
 have a few poised to go into production where it would be relatively
 easy to do benchmarking without distorting factors right now.)
 After that we jump to 16 cores, unless I'm forgetting something.
 These are currently all in production, but some of them are
 redundant machines which could be pulled for a few hours here and
 there for benchmarks.  If either of those seem worthwhile, please
 spec the useful tests so I can capture the right information.

Yes, both of those seem useful.  To compile, I do this:

./configure --prefix=$HOME/install/$BRANCHNAME --enable-depend
--enable-debug ${EXTRA_OPTIONS}
make
make -C contrib/pgbench
make check
make install
make -C contrib/pgbench install

In this case, the relevant builds would probably be (1) master, (2)
master with NUM_CLOG_BUFFERS = 16, (3) master with NUM_CLOG_BUFFERS =
32, and (4) master with NUM_CLOG_BUFFERS = 48.  (You could also try
intermediate numbers if it seems warranted.)

Basic test setup:

rm -rf $PGDATA
~/install/master/bin/initdb
cat  $PGDATA/postgresql.conf EOM;
shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms
EOM

I'm attaching a driver script you can modify to taste.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


runtestw
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 03:56 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 01/04/2012 12:56 PM, Tom Lane wrote:

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)

I tested it and you're right, it doesn't help. I don't see what else we
can do about it. There doesn't appear to be any test for an SV in the API.

I think what's being passed *is* an SV --- at least, the contents look
reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
this particular kind of SV.  Googling suggests that SvPVutf8 used to
fail on READONLY SVs, of which this is one if I'm reading the flag bits
correctly; but that was supposedly fixed years ago.  I believe we've hit
some other undocumented limitation of that function, which the Perl guys
may or may not acknowledge as a bug once we've tracked it down better.





Well, the crash is apparently solved by the following, which your 
investigation suggested to me:



diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..0785e2e 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -41,7 +41,7 @@ do_util_elog(int level, SV *msg)

PG_TRY();
{
-   cmsg = sv2cstr(msg);
+   cmsg = sv2cstr(newSVsv(msg));
elog(level, %s, cmsg);
pfree(cmsg);
}


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 03:56 PM, Tom Lane wrote:
 I think what's being passed *is* an SV --- at least, the contents look
 reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
 this particular kind of SV.  Googling suggests that SvPVutf8 used to
 fail on READONLY SVs, of which this is one if I'm reading the flag bits
 correctly; but that was supposedly fixed years ago.  I believe we've hit
 some other undocumented limitation of that function, which the Perl guys
 may or may not acknowledge as a bug once we've tracked it down better.

 Well, the crash is apparently solved by the following, which your 
 investigation suggested to me:
 -   cmsg = sv2cstr(msg);
 +   cmsg = sv2cstr(newSVsv(msg));

That's kinda grotty ... and leaky ...

I installed perl-debuginfo and soon found that SvPVutf8 leads to here:

(gdb) s
9066Perl_croak(aTHX_ Can't coerce readonly %s to string in 
%s,
(gdb) bt
#0  Perl_sv_pvn_force_flags (my_perl=0x17f3170, sv=0x18b6c50, 
lp=0x7fffb0c8e2f8, flags=optimized out) at sv.c:9066
#1  0x0038c30c7003 in Perl_sv_utf8_upgrade_flags_grow (my_perl=0x17f3170, 
sv=0x18b6c50, flags=2, extra=0) at sv.c:3228
#2  0x0038c30c7778 in Perl_sv_2pvutf8 (my_perl=0x17f3170, sv=0x18b6c50, 
lp=0x7fffb0c8e370) at sv.c:3079
#3  0x7f4308447614 in sv2cstr (sv=0x18b6c50) at plperl_helpers.h:54
#4  0x7f430844771f in do_util_elog (level=18, msg=0x18b6c50) at Util.xs:44
#5  0x7f4308447bdc in XS__elog (my_perl=0x17f3170, cv=0x181e008)
at Util.xs:105
#6  0x0038c30b548f in Perl_pp_entersub (my_perl=0x17f3170) at pp_hot.c:3046
#7  0x0038c30ac796 in Perl_runops_standard (my_perl=0x17f3170) at run.c:41
#8  0x0038c30480ae in Perl_call_sv (my_perl=0x17f3170, sv=0x19843e0, 
flags=10) at perl.c:2647
#9  0x7f4308440f3e in plperl_call_perl_func (desc=0x7fffb0c8e8b0, 
fcinfo=0x7fffb0c8fe10) at plperl.c:2018
#10 0x7f430843fa99 in plperl_inline_handler (fcinfo=0x7fffb0c902a0)
at plperl.c:1751

which leads to a few conclusions:

1. SvPVutf8 fails on readonly SVs, despite the fact that no such
limitation is documented and that this was supposedly fixed in 2004, cf
http://www.nntp.perl.org/group/perl.perl5.porters/2004/03/msg89505.html
We ought to hold somebody's feet to the fire about that.  I don't really
expect any response beyond documenting the limitation in perlapi, but
at least they ought to do that.

2. A slightly cleaner fix for this should be to duplicate the SV and
then release the copy around the SvPVutf8 call, only if it's readonly.
Fixing it in do_util_elog is entirely the wrong thing.

3. Perl_croak inside a PG_TRY block fails quite nastily.  I think we
might be well advised to move the sv2cstr(msg) call outside the PG_TRY,
but I'm wondering whether there is not a more general structural problem
in plperl concerning nesting of PG and Perl error recovery.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 01:13 PM, Tom Lane wrote:

In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php
it's pointed out that recent versions of pg_restore fall over on
archives made with -Fc --inserts (or --column-inserts), but only when
restoring direct to database; if you ask for text output it's perfectly
fine.  Investigation shows that the problem is that individual INSERT
commands are being broken apart at arbitrary buffer boundaries --- you
don't see any problem in text output, but when the bufferloads are
submitted as separate PQexec calls, of course bad things happen.

I believe this worked okay up until my patch here:
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=6545a901a
which removed the mini SQL lexer in pg_backup_db.c.  I had supposed that
that had no useful function except to separate COPY data from
not-COPY-data, but in reality it had another function of ensuring that
INSERT commands split across zlib bufferload boundaries would get
reassembled before they are submitted to PQexec.

Not entirely sure what to do about this.  We could consider reverting
the aforesaid patch and trying to find another way of fixing that code's
failure to cope with standard-conforming strings, but I'm not sure that
there's a good way to know whether standard_conforming_strings is active
here, and anyway that code was ugly as sin and I'd rather not resurrect
it.  But on the other hand, there are no clear line boundaries in the
compressed data, and we can't introduce any without (a) worsening
compression and (b) breaking compatibility with existing dump files.

Anybody have any bright ideas?  I'm fresh out at the moment.





I pondered this while out on my daily constitutional. The first thing 
that occurred to me is that it would possibly have been better if we'd 
made pg_dump not use a quoting mechanism whose behaviour is dependent on 
a setting (e.g. E'' or dollar quoting). But I guess that's water under 
the bridge.


Could we detect an appropriate line ending in ahwrite() after it's been 
decompressed and buffer partial lines accordingly?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 05:05 PM, Tom Lane wrote:



Well, the crash is apparently solved by the following, which your
investigation suggested to me:
-   cmsg = sv2cstr(msg);
+   cmsg = sv2cstr(newSVsv(msg));

That's kinda grotty ... and leaky ...



Of course it is. It wasn't meant as a solution but as validation of your 
suspicions about the nature of the problem. (The leakiness could be 
solved, though.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-04 Thread Jim Nasby
On Dec 20, 2011, at 11:29 PM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 So, what do we do about this?  The obvious answer is increase
 NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea.
 
 As you say, that's likely to hurt people running in small shared
 memory.  I too have thought about merging the SLRU areas into the main
 shared buffer arena, and likewise have concluded that it is likely to
 be way more painful than it's worth.  What I think might be an
 appropriate compromise is something similar to what we did for
 autotuning wal_buffers: use a fixed percentage of shared_buffers, with
 some minimum and maximum limits to ensure sanity.  But picking the
 appropriate percentage would take a bit of research.

ISTM that this is based more on number of CPUs rather than total memory, no? 
Likewise, things like the number of shared buffer partitions would be highly 
dependent on the number of CPUs.

So perhaps we should either probe the number of CPUs on a box, or have a GUC to 
tell us how many there are...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2012-01-04 Thread Jim Nasby
On Dec 19, 2011, at 12:31 PM, Simon Riggs wrote:
 On Sun, Dec 18, 2011 at 4:22 PM, Jim Nasby j...@nasby.net wrote:
 On Dec 18, 2011, at 2:28 AM, Gianni Ciolli wrote:
 I have written some notes about autonomous subtransactions, which have
 already been touched (at least) in two separate threads; please find
 them at
 
  http://wiki.postgresql.org/wiki/Autonomous_subtransactions
 
 The document seems to mix the terms subtransaction and autonomous 
 transaction. That's going to generate a ton of confusion, because both terms 
 already have meaning associated with them:
 
 - Autonomous transaction means you can execute something outside of your 
 current transaction and it is in no way effected by the current transaction 
 (doesn't matter if T0 commits or not).
 - Subtransactions are an alternative to savepoints. They allow you to break 
 a large transaction into smaller chunks, but if T0 doesn't commit then none 
 of the subtransactions do either.
 
 OK, perhaps we should just stick to the term Autonomous Transaction.
 That term is in common use, even if the usage is otherwise exactly the
 same as a subtransaction i.e. main transaction stops until the
 subtransaction is complete.

Except AFAIR Oracle uses the term to indicate something that is happening 
*outside* of your current transaction, which is definitely not what the 
proposal is talking about. I'm not wed to subtransaction (though I think it's 
a perfectly good name for this), but I definitely think calling this an 
autonomous transaction would be bad.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler
On Jan 4, 2012, at 2:48 PM, Andrew Dunstan wrote:

 That's kinda grotty ... and leaky ...
 
 
 Of course it is. It wasn't meant as a solution but as validation of your 
 suspicions about the nature of the problem. (The leakiness could be solved, 
 though.)

From #p5p on irc.perl.org:

[10:58pm]dg:interesting, so SvPV() handles string overloading, but SvPVutf8() 
doesn't, yet: Like CSvPV, but converts sv to utf8 first if necessary.
[10:58pm]dg:oh, only for readonly objects
[10:58pm]dg:probably why no-one has noticed, as version is probably the only 
readonly thing with string overloading
[11:08pm]TonyC:it doesn't need string overloading
[11:09pm]TonyC:https://gist.github.com/1562734
[11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() should 
prevent the leak in that workaround, assuming there's no FREETMPS between the 
call and use of the return value

Useful?

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 01:13 PM, Tom Lane wrote:
 Not entirely sure what to do about this.  We could consider reverting
 the aforesaid patch and trying to find another way of fixing that code's
 failure to cope with standard-conforming strings, but I'm not sure that
 there's a good way to know whether standard_conforming_strings is active
 here, and anyway that code was ugly as sin and I'd rather not resurrect
 it.  But on the other hand, there are no clear line boundaries in the
 compressed data, and we can't introduce any without (a) worsening
 compression and (b) breaking compatibility with existing dump files.
 
 Anybody have any bright ideas?  I'm fresh out at the moment.

 I pondered this while out on my daily constitutional. The first thing 
 that occurred to me is that it would possibly have been better if we'd 
 made pg_dump not use a quoting mechanism whose behaviour is dependent on 
 a setting (e.g. E'' or dollar quoting). But I guess that's water under 
 the bridge.

 Could we detect an appropriate line ending in ahwrite() after it's been 
 decompressed and buffer partial lines accordingly?

Not easily: there could be newlines embedded in data strings or SQL
identifiers.  I'm not seeing any way around this except to restore the
minimal lexing capability.  One thing we could probably do is to
restrict it to be used only when reading table data, and continue to
assume that object creation commands can be emitted as-is.  That would
at least get us out of needing to parse dollar-quoted literals, which
aren't used in the INSERT commands.  But we'd have to deal with
standard-conforming strings some way.  The idea I had about that, since
we have an open database connection at hand, is to check the connected
backend's standard_conforming_strings state via PQparameterStatus.  If
it doesn't have the right setting then things are going to fail anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Tom Lane
I wrote:
 Perhaps the intoRel stuff should be
 saving/restoring the original destreceiver instead of just blindly
 overwriting it.

I concluded that was the best fix, and have committed it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-04 Thread Samuel PHAN
I agree with Pavel also. Putting these constants in the pg_catalog isn't
the cleanest solution.

Though one can make its own little lib in python, perl, whatever, to store
these constants, it would be better if through the compilation, these C
constants were copied in a way for PL/*.

I can't really tell for the procedure languages other than PL/python but
for this one, typically, it would be nice to have them in *plpy* module.

result = plpy.execute(sql_query)
if result.status() == plpy.SPI_OK_SELECT:
# something...

Since the PG developers are the one who decide these constant values, it's
quite logic that the equivalent constants for each PL are made available by
the PG developers and not each geek to redo the wheel on it's own.

Well, it's not a crucial feature of course, but a nice to have, that was my
point ;-)

Have a nice day, guys,

Samuel


On Wed, Jan 4, 2012 at 3:11 AM, Andrew Dunstan and...@dunslane.net wrote:



 On 01/03/2012 08:40 PM, Robert Haas wrote:

 On Tue, Jan 3, 2012 at 9:33 AM, Pavel 
 Stehulepavel.stehule@gmail.**compavel.steh...@gmail.com
  wrote:

 I'd suppose it interesting to add a table to pg_catalog containing this
 data.

 - it is useless overhead

 I tend to agree.

  I am expecting so definition some constants in Perl, Python is simple

 Presumably one could instead write a script to transform the list of
 constants into a .pm file that could be loaded into the background, or
 whatever PL/python's equivalent of that concept is.  Not sure if
 there's a better way to do it.


 Yeah, I'm with you and Pavel. Here's my quick perl one-liner to produce a
 set of SPI_* constants for pl/perl. I'm looking at the best way to include
 this in the bootstrap code.

   perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; }
   print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
   src/include/executor/spi.h


 cheers

 andrew




Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 06:20 PM, Tom Lane wrote:

But we'd have to deal with
standard-conforming strings some way.  The idea I had about that, since
we have an open database connection at hand, is to check the connected
backend's standard_conforming_strings state via PQparameterStatus.  If
it doesn't have the right setting then things are going to fail anyway.



Do we care what it is? Or can we just issue a SET to make it what it 
needs to be?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2012-01-04 Thread Gianni Ciolli
On Wed, Jan 04, 2012 at 04:58:08PM -0600, Jim Nasby wrote:
 Except AFAIR Oracle uses the term to indicate something that is
 happening *outside* of your current transaction, which is definitely
 not what the proposal is talking about.

That feature is commonly translated in PostgreSQL to a dblink-based
solution, which itself is not distant from the current proposal, at
least in terms of inside/outside (the biggest difference I can see is
on sharing temporary tables).

But I am not sure I understand your remark; it would be clearer to me
if you could provide an example explaining the difference.

Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 06:20 PM, Tom Lane wrote:
 But we'd have to deal with
 standard-conforming strings some way.  The idea I had about that, since
 we have an open database connection at hand, is to check the connected
 backend's standard_conforming_strings state via PQparameterStatus.  If
 it doesn't have the right setting then things are going to fail anyway.

 Do we care what it is?

Well, yeah, we care.

 Or can we just issue a SET to make it what it needs to be?

We already did, but I don't think the code in pg_backup_db has
convenient access to the value.  I might be wrong about that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan



On 01/04/2012 06:15 PM, David E. Wheeler wrote:
[11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() 
should prevent the leak in that workaround, assuming there's no 
FREETMPS between the call and use of the return value


That's the solution to leakiness I had in mind.

Tom said:


2. A slightly cleaner fix for this should be to duplicate the SV and
then release the copy around the SvPVutf8 call, only if it's readonly.
Fixing it in do_util_elog is entirely the wrong thing.


How do we tell if it's readonly?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom said:
 2. A slightly cleaner fix for this should be to duplicate the SV and
 then release the copy around the SvPVutf8 call, only if it's readonly.
 Fixing it in do_util_elog is entirely the wrong thing.

 How do we tell if it's readonly?

SvREADONLY(sv) macro.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania
Thanks, i missed that.

After doing these changes, following is the observation:

1. The size of the pgstat file is 86KB. Last edited was when i moved the
file location to RAMdisk.
2. The issue persists. I am seeing continuous logs:

2012-01-05 00:00:06 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:14 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:26 JST WARNING:  pgstat wait timeout
.
.
.
2012-01-05 15:36:25 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:37 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:45 JST WARNING:  pgstat wait timeout


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5121894.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers