Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-08 Thread Simon Riggs
On 8 June 2012 05:01, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 On 7 June 2012 18:03, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 7, 2012 at 12:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Clearly, delaying checkpoint indefinitely would be a high risk choice.
 But they won't be delayed indefinitely, since changes cause WAL
 records to be written and that would soon cause another checkpoint.

 But that's exactly the problem - it might not be soon at all.

 Your customer's use-case seems very narrow, and your complaint seems
 unusual to me, but couldn't you just get the customer to force
 checkpoints in a cronjob or something? CheckPointStmt will force,
 provided !RecoveryInProgress() .

 I think both you and Simon have completely missed the point.  This
 is not a use case in the sense of someone doing it deliberately.
 This is about data redundancy, ie, if you lose your WAL through some
 unfortunate mishap, are you totally screwed or is there a reasonable
 chance that the data is on-disk in the main data store?  I would guess
 that the incidents Robert has been talking about were cases where EDB
 were engaged to do crash recovery, and were successful precisely because
 PG wasn't wholly dependent on the WAL copy of the data.

Apart from the likely point that hint bits exist on disk...

 This project has always put reliability first.  It's not clear to me why
 we would compromise that across-the-board in order to slightly reduce
 idle load in replication configurations.  Yeah, it's probably not a
 *large* compromise ... but it is a compromise, and one that doesn't
 seem to me to be very well-advised.  We can fix the idle-load issue
 without compromising on this basic goal; it will just take more than
 a ten-line patch to do it.

So now the standard for my patches is that I must consider what will
happen if the xlog is deleted?

Tell me such a rule is applied uniformly to all patches then I would be happy.


I will revoke, not because of the sense in this argument but because
you personally ask for it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-06-08 Thread Andres Freund
Hi,

On Friday, June 08, 2012 01:42:22 AM Simon Riggs wrote:
 On 7 June 2012 21:08, Andres Freund and...@2ndquadrant.com wrote:
  Moved the wakeup to a logical place outside a critical section.
  
  Hm. I don't really like the way you implemented that. While it reduces
  the likelihood quite a bit it will still miss wakeups if an XLogInsert
  pushes out the data because of missing space or if any place does an
  XLogFlush(lsn). The knowledge is really only available in XLogWrite...
 
 Right, but the placement inside the critical section was objected to.
And that objection was later somewhat eased by Tom. There currently still are 
several WalSndWakeup calls in critical sections and several other uses of 
latches in critial sections and several in signal handlers (which may be 
during a critical section). Thats why I added comments to SetLatch documenting 
that they need to be signal/concurrency safe. (Which they are atm)

 This way, any caller of XLogFlush() will be swept up at least once per
 wal_writer_delay, so missing a few calls doesn't mean we have spikes
 in replication delay.
Unfortunately it does. At least I measure it here ;) (obviously less than the 
prev. 7 seconds). Its not that surprising though: Especially during high or 
even more so during bursty wal activity a backend might decide to write out 
wal itself. In that case the background writer doesn't necessarily have 
anything to write out so it won't wake the walsender.
Under high load the number of wakeups is twice or thrice as high *before* my 
patch than afterwards (with synchronous_commit=on obviously). As you most 
definitely know (group commit work et al) in a concurrent situation many of 
the wakeups from the current location (RecordTransactionCommit) are useless 
because the xlog was already flushed by another backend/background writer 
before we get to do it.

 Doing it more frequently was also an objection from Fujii, to which we
 must listen.
I had hoped that I argued successfully against that, but obviously not ;)

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Andres Freund
Congratulations Kevin!
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Kyotaro HORIGUCHI
Hello, I will make this patch start again for this CF.

The requirement for this patch is as follows.

- What I want to get is similarity of the behaviors between
  master and (hot-)standby concerning checkpoint
  progression. Specifically, checkpoints for streaming
  replication running at the speed governed with
  checkpoint_segments. The work of this patch is avoiding to get
  unexpectedly large number of WAL segments stay on standby
  side. (Plus, increasing the chance to skip recovery-end
  checkpoint by my another patch.)

- This patch shouldn't affect archive recovery (excluding
  streaming). Activity of the checkpoints while recoverying from
  WAL archive (Precisely, while archive recovery without WAL
  receiver launched.) is depressed to checkpoint_timeout level as
  before.

- It might be better if the accelaration can be inhibited. But
  this patch does not have the feature. Is it needed?


After the consideration of the past discussion and the another
patch I'm going to put on the table, outline of this patch
becomes as follows.

- Check if it is under streaming replication by new function
  WalRcvStarted() which tells whether wal receiver has been
  launched so far.

  - The implement of WalRcvStarted() is changed from previous
one. Now the state is turned on in WalReceiverMain, at the
point where the state of walRcvState becomes
WALRCV_RUNNING. The behavior of previous implement reading
WalRcvInProgress() was useless for my another patch.

- Determine whether to delay checkpoint by GetLogReplayRecPtr()
  instead of GetInsertRecPtr() when WalRcvStarted() says true.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 6aeade9..cb2509a 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -46,6 +46,7 @@
 #include miscadmin.h
 #include pgstat.h
 #include postmaster/bgwriter.h
+#include replication/walreceiver.h
 #include replication/syncrep.h
 #include storage/bufmgr.h
 #include storage/ipc.h
@@ -493,8 +494,8 @@ CheckpointerMain(void)
 			 * Initialize checkpointer-private variables used during checkpoint
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =
+do_restartpoint ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -747,6 +748,7 @@ IsCheckpointOnSchedule(double progress)
 	struct timeval now;
 	double		elapsed_xlogs,
 elapsed_time;
+	boolrecovery_in_progress;
 
 	Assert(ckpt_active);
 
@@ -763,18 +765,26 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
 	 * computed before calling CreateCheckPoint. The code in XLogInsert that
 	 * actually triggers a checkpoint when checkpoint_segments is exceeded
-	 * compares against RedoRecptr, so this is not completely accurate.
-	 * However, it's good enough for our purposes, we're only calculating an
-	 * estimate anyway.
+	 * compares against RedoRecPtr.  Similarly, we consult WAL replay location
+	 * instead on hot standbys and XLogPageRead compares it aganst RedoRecPtr,
+	 * too.  Altough these are not completely accurate, it's good enough for
+	 * our purposes, we're only calculating an estimate anyway.
+	 */
+
+	/*
+	 * Inhibit governing progress by segments in archive recovery.
 	 */
-	if (!RecoveryInProgress())
+	recovery_in_progress = RecoveryInProgress();
+	if (!recovery_in_progress || WalRcvStarted())
 	{
-		recptr = GetInsertRecPtr();
+		recptr = recovery_in_progress ? GetXLogReplayRecPtr(NULL) :
+			GetInsertRecPtr();
 		elapsed_xlogs =
 			(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
 			 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d63ff29..7d57ad7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -215,6 +215,7 @@ WalReceiverMain(void)
 	/* Advertise our PID so that the startup process can kill us */
 	walrcv-pid = MyProcPid;
 	walrcv-walRcvState = WALRCV_RUNNING;
+	walrcv-started = true;
 
 	/* Fetch information required to start streaming */
 	strlcpy(conninfo, (char *) walrcv-conninfo, MAXCONNINFO);
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index f8dd523..c3b26e9 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -31,6 +31,7 @@
 #include utils/timestamp.h
 
 

[HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-08 Thread Kyotaro HORIGUCHI
Hello,

I have a problem with promoting from hot-standby that exclusive
checkpointing retards completion of promotion.

This checkpoint is shutdown checkpoint as a convention in
realtion to TLI increment according to the comment shown below. I
suppose shutdown checkpoint means exclusive checkpoint - in
other words, checkpoint without WAL inserts meanwhile.

  * one. This is not particularly critical, but since we may be
  * assigning a new TLI, using a shutdown checkpoint allows us to have
  * the rule that TLI only changes in shutdown checkpoints, which
  * allows some extra error checking in xlog_redo.

I depend on this and suppose we can omit it if latest checkpoint
has been taken so as to be able to do crash recovery thereafter.
This condition could be secured by my another patch for
checkpoint_segments on standby.

After applying this patch, checkpoint after archive recovery at
near the end of StartupXLOG() will be skiped on the condition
follows,

- WAL receiver has been launched so far. (using WalRcvStarted())

- XLogCheckpointNeeded() against replayEndRecPtr says no need of
  checkpoint.

What do you think about this?


This patch needs WalRcvStarted() introduced by my another patch.

http://archives.postgresql.org/pgsql-hackers/2012-06/msg00287.php

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0f2678c..48c0cf6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6905,9 +6905,41 @@ StartupXLOG(void)
 		 * allows some extra error checking in xlog_redo.
 		 */
 		if (bgwriterLaunched)
-			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-			  CHECKPOINT_IMMEDIATE |
-			  CHECKPOINT_WAIT);
+		{
+			bool do_checkpoint = true;
+
+			if (WalRcvStarted())
+			{
+/*
+ * This shutdown checkpoint on promotion should retards
+ * failover completion. In spite of the rule for TLI and
+ * shutdown checkpoint mentioned above, we want to skip this
+ * checkpoint securing recoveribility by crash recovery after
+ * this point.
+ */
+uint32 replayEndId = 0;
+uint32 replayEndSeg = 0;
+XLogRecPtr replayEndRecPtr;
+/* use volatile pointer to prevent code rearrangement */
+volatile XLogCtlData *xlogctl = XLogCtl;
+
+SpinLockAcquire(xlogctl-info_lck);
+replayEndRecPtr = xlogctl-replayEndRecPtr;
+SpinLockRelease(xlogctl-info_lck);
+XLByteToSeg(replayEndRecPtr, replayEndId, replayEndSeg);
+if (!XLogCheckpointNeeded(replayEndId, replayEndSeg))
+{
+	do_checkpoint = false;
+	ereport(LOG,
+			(errmsg(Checkpoint on recovery end was skipped)));
+}
+			}
+			
+			if (do_checkpoint)
+RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+  CHECKPOINT_IMMEDIATE |
+  CHECKPOINT_WAIT);
+		}
 		else
 			CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-06-08 Thread Simon Riggs
On 25 April 2012 10:40, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I tried to implement a patch according to the idea. It allows extensions
 to register an entry point of the self-managed daemon processes,
 then postmaster start and stop them according to the normal manner.

The patch needs much work yet, but has many good ideas.

There doesn't seem to be a place where we pass the parameter to say
which one of the multiple daemons a particular process should become.
It would be helpful for testing to make the example module call 2
daemons each with slightly different characteristics or parameters, so
we can test the full function of the patch.

I think its essential that we allow these processes to execute SQL, so
we must correctly initialise them as backends and set up signalling.
Which also means we need a parameter to limit the number of such
processes.

Also, I prefer to call these bgworker processes, which is more similar
to auto vacuum worker and bgwriter naming. That also gives a clue as
to how to set up signalling etc..

I don't think we should allow these processes to override sighup and
sigterm. Signal handling should be pretty standard, just as it is with
normal backends.

I have a prototype that has some of these characteristics, so I see
our work as complementary.

At present, I don't think this patch would be committable in CF1, but
I'd like to make faster progress with it than that. Do you want to
work on this more, or would you like me to merge our prototypes into a
more likely candidate?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Cédric Villemain
Le vendredi 8 juin 2012 00:15:23, Tom Lane a écrit :
 I am pleased to announce that Kevin Grittner has accepted the core
 committee's invitation to become our newest committer.  As you all
 know, Kevin's done a good deal of work on the project over the past
 couple of years.  We judged that he has the requisite skills,
 dedication to the project, and a suitable degree of caution to be
 a good committer.  Please join me in welcoming him aboard.

very good news! 
Congratulations Kevin.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Simon Riggs
On 8 June 2012 09:14, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote:

 The requirement for this patch is as follows.

 - What I want to get is similarity of the behaviors between
  master and (hot-)standby concerning checkpoint
  progression. Specifically, checkpoints for streaming
  replication running at the speed governed with
  checkpoint_segments. The work of this patch is avoiding to get
  unexpectedly large number of WAL segments stay on standby
  side. (Plus, increasing the chance to skip recovery-end
  checkpoint by my another patch.)

Since we want wal_keep_segments number of WAL files on master (and
because of cascading, on standby also), I don't see any purpose to
triggering more frequent checkpoints just so we can hit a magic number
that is most often set wrong.

ISTM that we should avoid triggering a checkpoint on the master if
checkpoint_segments is less than wal_keep_segments. Such checkpoints
serve no purpose because we don't actually limit and recycle the WAL
files and all it does is slow people down.

Also, I don't believe that throwing more checkpoints makes it more
likely we can skip shutdown checkpoints at failover.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Boszormenyi Zoltan

2012-06-08 00:15 keltezéssel, Tom Lane írta:

I am pleased to announce that Kevin Grittner has accepted the core
committee's invitation to become our newest committer.  As you all
know, Kevin's done a good deal of work on the project over the past
couple of years.  We judged that he has the requisite skills,
dedication to the project, and a suitable degree of caution to be
a good committer.  Please join me in welcoming him aboard.

regards, tom lane



Congratulations!


--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-08 Thread Simon Riggs
On 8 June 2012 09:22, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote:

 I have a problem with promoting from hot-standby that exclusive
 checkpointing retards completion of promotion.

Agreed, we have that problem.

 I depend on this and suppose we can omit it if latest checkpoint
 has been taken so as to be able to do crash recovery thereafter.

I don't see any reason to special case this. If a checkpoint has no
work to do, then it will go very quickly. Why seek to speed it up even
further?

 This condition could be secured by my another patch for
 checkpoint_segments on standby.

More frequent checkpoints are very unlikely to secure a condition that
no checkpoint at all is required at failover.

Making a change that has a negative effect for everybody, in the hope
of sometimes improving performance for something that is already fast
doesn't seem a good trade off to me.

Regrettably, the line of thought explained here does not seem useful to me.

As you know, I was working on avoiding shutdown checkpoints completely
myself. You are welcome to work on the approach Fujii and I discussed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-08 Thread Robert Haas
On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote:
 However, not throwing errors on the URL syntax should be considered a
 bug, I think.

 +1.

 +1

 Here's a patch that just makes the thing an error.  Of course we could
 revert it if it makes the URI feature otherwise unusable...but I don't
 see a huge and terrible blocker ATM.  A major question mark for me any
 extra stuff in JDBC URLs.

It looks like the answer is yes.

http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

...but I'm inclined to think we should make this change anyway.  If
JDBC used libpq, then it might be nice to let JDBC parse out bits of
the URL and then pass the whole thing, unmodified, through to libpq,
without having libpq spit up.  But it doesn't.  And even if someone
were inclined to try to do something of that type, the warnings we're
omitting now would presumably discourage them.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-08 Thread Magnus Hagander
On Fri, Jun 8, 2012 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote:
 However, not throwing errors on the URL syntax should be considered a
 bug, I think.

 +1.

 +1

 Here's a patch that just makes the thing an error.  Of course we could
 revert it if it makes the URI feature otherwise unusable...but I don't
 see a huge and terrible blocker ATM.  A major question mark for me any
 extra stuff in JDBC URLs.

 It looks like the answer is yes.

 http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

 ...but I'm inclined to think we should make this change anyway.  If
 JDBC used libpq, then it might be nice to let JDBC parse out bits of
 the URL and then pass the whole thing, unmodified, through to libpq,
 without having libpq spit up.  But it doesn't.  And even if someone
 were inclined to try to do something of that type, the warnings we're
 omitting now would presumably discourage them.

 Thoughts?

I think we *have* to make the change for regular parameters, for
security reasons.

What we do with prefixed parameters can be debated... But we'll have
to pass those to the server anyway for validation, so it might be an
uninteresting case.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Robert Haas
On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It seems that in implementing ginbuildempty(), I falsified the first
 note in the header comment for log_newpage():

  * Note: all current callers build pages in private memory and write them
  * directly to smgr, rather than using bufmgr.  Therefore there is no need
  * to pass a buffer ID to XLogInsert, nor to perform MarkBufferDirty within
  * the critical section.

 1. Considering that we're logging the entire page, is it necessary (or
 even desirable) to include the buffer ID in the rdata structure?  If
 so, why?  To put that another way, does my abuse of log_newpage
 constitute a bug in gistbuildempty()?

 AFAICS, not passing the buffer ID to XLogInsert is not an issue, since
 we are logging the whole page in any case.  However, failing to perform
 MarkBufferDirty within the critical section definitely is an issue.

However, I'm not failing to do that: there's an enclosing critical section.

 2. Should we add a new function that does the same thing as
 log_newpage for a shared buffer?  I'm imagining that the signature
 would be:

 Either that or rethink building this data in shared buffers.  What's the
 point of that, exactly, for a page that we are most certainly not going
 to use in normal operation?

Well, right.  I mean, I think the current implementation is mostly
design-by-accident, and my first thought was the same as yours: don't
clutter shared_buffers with pages we don't actually need for anything.
 But there is a down side to doing it the other way, too.  Look at
what btbuildempty does:

/* Write the page.  If archiving/streaming, XLOG it. */
smgrwrite(index-rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
  (char *) metapage, true);
if (XLogIsNeeded())
log_newpage(index-rd_smgr-smgr_rnode.node, INIT_FORKNUM,
BTREE_METAPAGE, metapage);

/*
 * An immediate sync is require even if we xlog'd the page, because the
 * write did not go through shared_buffers and therefore a concurrent
 * checkpoint may have move the redo pointer past our xlog record.
 */
smgrimmedsync(index-rd_smgr, INIT_FORKNUM);

So we have to write the page out immediately, then we have to XLOG it,
and then even though we've XLOG'd it, we still have to fsync it
immediately.  It might be better to go through shared_buffers, which
would allow the write and fsync to happen in the background.  The
cache-poisoning effect is probably trivial  - even on a system with
32MB of shared_buffers there are thousands of pages, and init forks
are never going to consume more than a handful unless you're creating
an awful lot of unlogged relations very quickly - in which case maybe
avoiding the immediate fsyncs is a more pressing concern.  On the
other hand, we haven't had any complaints about the way it works now,
either.  What's your thought?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 7:53 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Jun 8, 2012 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 6:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 However, not throwing errors on the URL syntax should be considered a
 bug, I think.

 +1.

 +1

 Here's a patch that just makes the thing an error.  Of course we could
 revert it if it makes the URI feature otherwise unusable...but I don't
 see a huge and terrible blocker ATM.  A major question mark for me any
 extra stuff in JDBC URLs.

 It looks like the answer is yes.

 http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

 ...but I'm inclined to think we should make this change anyway.  If
 JDBC used libpq, then it might be nice to let JDBC parse out bits of
 the URL and then pass the whole thing, unmodified, through to libpq,
 without having libpq spit up.  But it doesn't.  And even if someone
 were inclined to try to do something of that type, the warnings we're
 omitting now would presumably discourage them.

 Thoughts?

 I think we *have* to make the change for regular parameters, for
 security reasons.

 What we do with prefixed parameters can be debated... But we'll have
 to pass those to the server anyway for validation, so it might be an
 uninteresting case.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS, not passing the buffer ID to XLogInsert is not an issue, since
 we are logging the whole page in any case.  However, failing to perform
 MarkBufferDirty within the critical section definitely is an issue.

 However, I'm not failing to do that: there's an enclosing critical section.

Mph.  But is it being done in the right order relative to the other XLOG
related steps?  See the code sketch in transam/README.

 So we have to write the page out immediately, then we have to XLOG it,
 and then even though we've XLOG'd it, we still have to fsync it
 immediately.  It might be better to go through shared_buffers, which
 would allow the write and fsync to happen in the background.

Well, that's a fair point, but on the other hand we've not had any
complaints traceable to the cost of making init forks.

On the whole, I don't care for the idea that the
modify-and-xlog-a-buffer sequence is being split between log_newpage and
its caller.  That sounds like bugs waiting to happen anytime somebody
refactors XLOG operations.  It would probably be best to refactor this
as you're suggesting, so that that becomes less nonstandard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 09:14, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp 
 wrote:

 The requirement for this patch is as follows.

 - What I want to get is similarity of the behaviors between
  master and (hot-)standby concerning checkpoint
  progression. Specifically, checkpoints for streaming
  replication running at the speed governed with
  checkpoint_segments. The work of this patch is avoiding to get
  unexpectedly large number of WAL segments stay on standby
  side. (Plus, increasing the chance to skip recovery-end
  checkpoint by my another patch.)

 Since we want wal_keep_segments number of WAL files on master (and
 because of cascading, on standby also), I don't see any purpose to
 triggering more frequent checkpoints just so we can hit a magic number
 that is most often set wrong.

This is a good point.  Right now, if you set checkpoint_segments to a
large value, we retain lots of old WAL segments even when the system
is idle (cf. XLOGfileslop).  I think we could be smarter about that.
I'm not sure what the exact algorithm should be, but right now users
are forced between setting checkpoint_segments very large to achieve
optimum write performance and setting it small to conserve disk space.
 What would be much better, IMHO, is if the number of retained
segments could ratchet down when the system is idle, eventually
reaching a state where we keep only one segment beyond the one
currently in use.

For example, suppose I have checkpoint_timeout=10min and
checkpoint_segments=300.  If, five minutes into the ten-minute
checkpoint interval, I've only used 10 WAL segments, then I probably
am not going to need another 290 of them in the remaining five
minutes.  We ought to keep, say, 20 in that case (number we expect to
need * 2, similar to bgwriter_lru_multiplier) and delete the rest.

If we did that, people could set checkpoint_segments much higher to
handle periods of peak load without continuously consuming large
amounts of space with old, useless WAL segments.  It doesn't end up
working very well anyway because the old WAL segments are no longer in
cache by the time we go to overwrite them.

 ISTM that we should avoid triggering a checkpoint on the master if
 checkpoint_segments is less than wal_keep_segments. Such checkpoints
 serve no purpose because we don't actually limit and recycle the WAL
 files and all it does is slow people down.

On the other hand, I emphatically disagree with this, for the same
reasons as on the other thread.  Getting data down to disk provides a
greater measure of safety than having it in memory.  Making
checkpoint_segments not force a checkpoint is no better than making
checkpoint_timeout not force a checkpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 9:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 7, 2012 at 8:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS, not passing the buffer ID to XLogInsert is not an issue, since
 we are logging the whole page in any case.  However, failing to perform
 MarkBufferDirty within the critical section definitely is an issue.

 However, I'm not failing to do that: there's an enclosing critical section.

 Mph.  But is it being done in the right order relative to the other XLOG
 related steps?  See the code sketch in transam/README.

It appears to me that it is being done in the right order.

 So we have to write the page out immediately, then we have to XLOG it,
 and then even though we've XLOG'd it, we still have to fsync it
 immediately.  It might be better to go through shared_buffers, which
 would allow the write and fsync to happen in the background.

 Well, that's a fair point, but on the other hand we've not had any
 complaints traceable to the cost of making init forks.

 On the whole, I don't care for the idea that the
 modify-and-xlog-a-buffer sequence is being split between log_newpage and
 its caller.  That sounds like bugs waiting to happen anytime somebody
 refactors XLOG operations.  It would probably be best to refactor this
 as you're suggesting, so that that becomes less nonstandard.

OK.  So what I'm thinking is that we should add a new function that
takes a relfilenode and a buffer and steps 4-6 of what's described in
transam/README: mark the buffer dirty, xlog it, and set the LSN and
TLI.  We might want to have this function assert that it is in a
critical section, for the avoidance of error.  Then anyone who wants
to use it can do steps 1-3, call the function, and then finish up with
steps 6-7.  I don't think we can cleanly encapsulate any more than
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Simon Riggs
On 8 June 2012 14:47, Robert Haas robertmh...@gmail.com wrote:

 ISTM that we should avoid triggering a checkpoint on the master if
 checkpoint_segments is less than wal_keep_segments. Such checkpoints
 serve no purpose because we don't actually limit and recycle the WAL
 files and all it does is slow people down.

 On the other hand, I emphatically disagree with this, for the same
 reasons as on the other thread.  Getting data down to disk provides a
 greater measure of safety than having it in memory.  Making
 checkpoint_segments not force a checkpoint is no better than making
 checkpoint_timeout not force a checkpoint.

Not sure which bit you are disagreeing with. I have no suggested
change to checkpoint_timeout.

What I'm saying is that forcing a checkpoint to save space, when we
aren't going to save space, makes no sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 9:58 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 14:47, Robert Haas robertmh...@gmail.com wrote:

 ISTM that we should avoid triggering a checkpoint on the master if
 checkpoint_segments is less than wal_keep_segments. Such checkpoints
 serve no purpose because we don't actually limit and recycle the WAL
 files and all it does is slow people down.

 On the other hand, I emphatically disagree with this, for the same
 reasons as on the other thread.  Getting data down to disk provides a
 greater measure of safety than having it in memory.  Making
 checkpoint_segments not force a checkpoint is no better than making
 checkpoint_timeout not force a checkpoint.

 Not sure which bit you are disagreeing with. I have no suggested
 change to checkpoint_timeout.

You already made it not a hard timeout.  We have another nearby thread
discussing why I don't like that.

 What I'm saying is that forcing a checkpoint to save space, when we
 aren't going to save space, makes no sense.

We are also forcing a checkpoint to limit recovery time and data loss
potential, not just to save space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Simon Riggs
On 8 June 2012 15:21, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 8, 2012 at 9:58 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 14:47, Robert Haas robertmh...@gmail.com wrote:

 ISTM that we should avoid triggering a checkpoint on the master if
 checkpoint_segments is less than wal_keep_segments. Such checkpoints
 serve no purpose because we don't actually limit and recycle the WAL
 files and all it does is slow people down.

 On the other hand, I emphatically disagree with this, for the same
 reasons as on the other thread.  Getting data down to disk provides a
 greater measure of safety than having it in memory.  Making
 checkpoint_segments not force a checkpoint is no better than making
 checkpoint_timeout not force a checkpoint.

 Not sure which bit you are disagreeing with. I have no suggested
 change to checkpoint_timeout.

 You already made it not a hard timeout.  We have another nearby thread
 discussing why I don't like that.

 What I'm saying is that forcing a checkpoint to save space, when we
 aren't going to save space, makes no sense.

 We are also forcing a checkpoint to limit recovery time and data loss
 potential, not just to save space.

Nothing I've said on this thread is related to the other thread.
Please don't confuse matters.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-08 Thread Robert Haas
On Thu, Jun 7, 2012 at 9:25 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The only risk of data loss is in the case where someone deletes their
 pg_xlog and who didn't take a backup in all that time, which is hardly
 recommended behaviour. We're at exactly the same risk of data loss if
 someone deletes their pg_clog. Too frequent checkpoints actually makes
 the data loss risk from deleted pg_clog greater, so the balance of
 data loss risk doesn't seem to have altered.

This doesn't match my experience.  pg_xlog is often located on a
separate disk, which significantly increases the chances of something
bad happening to it, either through user error or because, uh, disks
sometimes fail.  Now, granted, you can also lose your data directory
(including pg_clog) this way, but just because we lose data in that
situation doesn't mean we should be happy about also losing data when
pg_xlog goes does the toilet, especially when we can easily prevent it
by going back to the behavior we've had in every previous release.

Now, I have had customers lose pg_clog data, and it does suck, but
it's usually a safe bet that most of the missing transactions
committed, so you can pad out the missing files with 0x55, and
probably get your data back.   On the other hand, it's impossible to
guess what any missing pg_xlog data might have been.  Perhaps if the
data pages are on disk and only CLOG didn't get written you could
somehow figure out which bits you need to flip in CLOG to get your
data back, but that's pretty heavy brain surgery, and if autovacuum or
even just a HOT prune runs before you realize that you need to do it
then you're toast.  OTOH, if the database has checkpointed,
pg_resetxlog is remarkably successful in letting you pick up the
pieces and go on with your life.

All that having been said, it wouldn't be a stupid idea to have a
little more redundancy in our CLOG mechanism than we do right now.
Hint bits help, as does the predictability of the data, but it's still
an awfully scary to have that much critical data packed into that
small a space.  I'd love to see us checksum those pages, or store the
data in some redundant location that makes it unlikely we'll lose both
copies, or ship a utility that will scan all your heap pages and try
to find hint bits that reveal which transactions committed and which
ones aborted, or all of the above.  But until then, I'd like to make
sure that we at least have the data on the disk instead of sitting
dirty in memory forever.

As a general thought about disaster recovery, my experience is that if
you can tell a customer to run a command (like pg_resetxlog), or - not
quite as good - if you can tell them to run some script that you email
them (like my pad-out-the-CLOG-with-0x55 script), then they're willing
to do that, and it usually works, and they're as happy as they're
going to be.  But if you tell them that they have to send you all
their data files or let you log into the machine and poke around for
$X/hour * many hours, then they typically don't want to do that.
Sometimes it's legally or procedurally impossible for them; even if
not, it's cheaper to find some other way to cope with the situation,
so they do, but now -  the way they view it - the database lost their
data.  Even if the problem was entirely self-inflicted, like an
intentional deletion of pg_xlog, and even if they therefore understand
that it was entirely their own stupid fault that the data got eaten,
it's a bad experience.  For that reason, I think we should be looking
for opportunities to increase the recoverability of the database in
every area.  I'm sure that everyone on this list who works with
customers on a regular basis has had customers who lost pg_xlog, who
lost pg_clog (or portions theref), who dropped their main table, who
lost the backing files for pg_class and/or pg_attribute, whose
database ended up in lost+found, who had a break in WAL, who had
individual blocks corrupted or unreadable within some important table,
who were missing TOAST chunks, who took a pg_basebackup and failed to
create recovery.conf, who had a corrupted index on a critical system
table, who had inconsistent system catalog contents.  Some of these
problems are caused by bad hardware or bugs, but the most common cause
is user error.  Regardless of the cause, the user wants to get as much
of their data back as possible as quickly and as easily and as
reliably as possible.  To the extent that we can transform a
situations that would have required consulting hours into situations
from which a semi-automated recovery is possible, or situations that
would have required many consulting hours into ones that require only
a few, that's a huge win.  Of course, we shouldn't place that goal
above all else; and of course, this is only one small piece of that.
But it is a piece, and it has a tangible benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-08 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 if the database has checkpointed
 
I haven't been exactly clear on the risks about which Tom and Robert
have been concerned; is it a question about whether we change the
meaning of these settings to something more complicated?:
 
checkpoint_segments (integer)
Maximum number of log file segments between automatic WAL
checkpoints
 
checkpoint_timeout (integer)
Maximum time between automatic WAL checkpoints
 
I can see possibly changing the latter when absolutely nothing has
been written to WAL since the last checkpoint, although I'm not sure
that should suppress flushing dirty pages (e.g., from hinting) to
disk.  Such a change seems like it would be of pretty minimal
benefit, though, and not something for which it is worth taking on
any risk at all.  Any other change to the semantics of these
settings seems ill advised on the face of it.
 
... or am I not grasping the issue properly?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Florian Pflug
On Jun8, 2012, at 15:47 , Robert Haas wrote:
 On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 09:14, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp 
 wrote:
 
 The requirement for this patch is as follows.
 
 - What I want to get is similarity of the behaviors between
  master and (hot-)standby concerning checkpoint
  progression. Specifically, checkpoints for streaming
  replication running at the speed governed with
  checkpoint_segments. The work of this patch is avoiding to get
  unexpectedly large number of WAL segments stay on standby
  side. (Plus, increasing the chance to skip recovery-end
  checkpoint by my another patch.)
 
 Since we want wal_keep_segments number of WAL files on master (and
 because of cascading, on standby also), I don't see any purpose to
 triggering more frequent checkpoints just so we can hit a magic number
 that is most often set wrong.
 
 This is a good point.  Right now, if you set checkpoint_segments to a
 large value, we retain lots of old WAL segments even when the system
 is idle (cf. XLOGfileslop).  I think we could be smarter about that.
 I'm not sure what the exact algorithm should be, but right now users
 are forced between setting checkpoint_segments very large to achieve
 optimum write performance and setting it small to conserve disk space.
 What would be much better, IMHO, is if the number of retained
 segments could ratchet down when the system is idle, eventually
 reaching a state where we keep only one segment beyond the one
 currently in use.

I'm a bit sceptical about this. It seems to me that you wouldn't actually
be able to do anything useful with the conserved space, since postgres
could re-claim it at any time. At which point it'd better be available,
or your whole cluster comes to a screeching halt...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 12:24 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I haven't been exactly clear on the risks about which Tom and Robert
 have been concerned; is it a question about whether we change the
 meaning of these settings to something more complicated?:

 checkpoint_segments (integer)
    Maximum number of log file segments between automatic WAL
    checkpoints

 checkpoint_timeout (integer)
    Maximum time between automatic WAL checkpoints

The issue is that, in the tip of the 9.2 branch, checkpoint_timeout is
no longer the maximum time between automatic WAL checkpoints.
Instead, the checkpoint is skipped if we're still in the same WAL
segment that we were in when we did the last checkpoint.  Therefore,
there is absolutely no upper bound on the amount of time that can pass
between checkpoints.  If someone does one transaction, which happens
not to cross a WAL segment boundary, we will never automatically
checkpoint that transaction.  A checkpoint will ONLY be triggered when
we have enough write-ahead log volume to get us into the next segment.
 I am arguing (and Tom is now agreeing) that this is bad, and that the
patch which made this change needs either some kind of fix, or to be
reverted completely.

The original motivation for the patch was that the code to suppress
duplicate checkpoints stopped working correctly when Hot Standby was
committed.  The previous coding (before the commit at issue) skips a
checkpoint if no write-ahead log records at all have been emitted
since the start of the preceding checkpoint.  I believe this is the
correct behavior, but there's a problem:  when wal_level =
hot_standby, we emit an XLOG_RUNNING_XACTS record during every
checkpoint cycle.  So, if wal_level = hot_standby, the test for
whether anything has happened always returns false, and so the system
never quiesces: every checkpoint cycle contains at least the
XLOG_RUNNING_XACTS record, even if nothing else, so we never get to
skip any checkpoints.  When wal_level  hot_standby, the problem does
not exist and redundant checkpoints are suppressed just as we would
hope.

While Simon's patch does fix the problem, I believe that making
checkpoint_timeout anything less than a hard timeout is unwise.  The
previous behavior - at least one checkpoint per checkpoint_timeout -
is easy to understand and plan for; I believe the new behavior will be
an unpleasant surprise for users who care about checkpointing
regularly, which I think most do, whether they are here to be
represented in this conversation or not.  So I think we need a
different fix for the problem that wal_level = hot_standby defeats the
redundant-checkpoint-detection code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 1:01 PM, Florian Pflug f...@phlo.org wrote:
 On Jun8, 2012, at 15:47 , Robert Haas wrote:
 On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 09:14, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp 
 wrote:

 The requirement for this patch is as follows.

 - What I want to get is similarity of the behaviors between
  master and (hot-)standby concerning checkpoint
  progression. Specifically, checkpoints for streaming
  replication running at the speed governed with
  checkpoint_segments. The work of this patch is avoiding to get
  unexpectedly large number of WAL segments stay on standby
  side. (Plus, increasing the chance to skip recovery-end
  checkpoint by my another patch.)

 Since we want wal_keep_segments number of WAL files on master (and
 because of cascading, on standby also), I don't see any purpose to
 triggering more frequent checkpoints just so we can hit a magic number
 that is most often set wrong.

 This is a good point.  Right now, if you set checkpoint_segments to a
 large value, we retain lots of old WAL segments even when the system
 is idle (cf. XLOGfileslop).  I think we could be smarter about that.
 I'm not sure what the exact algorithm should be, but right now users
 are forced between setting checkpoint_segments very large to achieve
 optimum write performance and setting it small to conserve disk space.
 What would be much better, IMHO, is if the number of retained
 segments could ratchet down when the system is idle, eventually
 reaching a state where we keep only one segment beyond the one
 currently in use.

 I'm a bit sceptical about this. It seems to me that you wouldn't actually
 be able to do anything useful with the conserved space, since postgres
 could re-claim it at any time. At which point it'd better be available,
 or your whole cluster comes to a screeching halt...

Well, the issue for me is elasticity.  Right now we ship with
checkpoint_segments=3.  That causes terribly performance on many
real-world workloads.  But say we ship with checkpoint_segments = 100,
which is a far better setting from a performance point of view.  Then
pg_xlog space utilization will eventually grow to more than 3 GB, even
on a low-velocity system where they don't improve performance.  I'm
not sure whether it's useful for the number of checkpoint segments to
vary dramatically on a single system, but I do think it would be very
nice if we could ship with a less conservative default without eating
up so much disk space.  Maybe there's a better way of going about
that, but I agree with Simon's point that the setting is often wrong.
Frequently it's too low; sometimes it's too high; occasionally it's
got both problems simultaneously.  If you have another idea on how to
improve this, I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
 OK.  So what I'm thinking is that we should add a new function that
 takes a relfilenode and a buffer and steps 4-6 of what's described in
 transam/README: mark the buffer dirty, xlog it, and set the LSN and
 TLI.  We might want to have this function assert that it is in a
 critical section, for the avoidance of error.  Then anyone who wants
 to use it can do steps 1-3, call the function, and then finish up with
 steps 6-7.  I don't think we can cleanly encapsulate any more than
 that.

On further review, I think that we ought to make MarkBufferDirty() the
caller's job, because sometimes we may need to xlog only if
XLogIsNeeded(), but the buffer's got to get marked dirty either way.
So I think the new function should just do step 5 - emit XLOG and set
LSN/TLI.

Proposed patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


log-newpage-buffer-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-08 Thread Amit kapila
This patch is to provide support for fallback application name for 
contrib/pgbench, oid2name, and dblink.
Currently I have done the implementation for pgbench. The implementation is 
same as in psql.
Before creating a final patch, I wanted to check whether my direction for 
creating a patch is what is expected from this Todo item.

I have done the basic testing for following 2 scenario's
1. After implementation, if during run of pgbench, we query pg_stat_activity it 
displays the application name as pgbench
2. It displays the application name in log file also.

Suggestions?



fallback_application_name.patch
Description: fallback_application_name.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-06-08 Thread Robert Haas
On Sun, Apr 29, 2012 at 4:02 PM, Noah Misch n...@leadboat.com wrote:
 On Tue, Apr 24, 2012 at 11:55:15PM -0400, Noah Misch wrote:
 Concerning everyone's favorite topic, how to name the new type of table, I
 liked Tom's proposal[1] to make CREATE TEMP TABLE retain current behavior and
 have CREATE GLOBAL TEMP TABLE and/or CREATE LOCAL TEMP TABLE request the new
 SQL-standard variety.  (I'd vote for using CREATE GLOBAL and retaining CREATE
 LOCAL for future expansion.)  As he mentions, to get there, we'd ideally 
 start
 by producing a warning instead of silently accepting GLOBAL as a noise word.
 Should we put such a warning into 9.2?

 Here is the change I'd make.

This is listed on the open items list.

I haven't ever heard anyone propose to redefine CREATE LOCAL TEMP
TABLE to mean anything different than CREATE TEMP TABLE, so I'm
disinclined to warn about that.

I would be more open to warning people about CREATE GLOBAL TEMP TABLE
- frankly, it's pretty wonky that we allow that but treat GLOBAL as a
noise word in this first place.  But I'm a little disinclined to have
the message speculate about what might happen in future versions of
PostgreSQL.  Such predictions don't have a very good track record of
being accurate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-06-08 Thread Simon Riggs
On 8 June 2012 18:26, Robert Haas robertmh...@gmail.com wrote:

 I would be more open to warning people about CREATE GLOBAL TEMP TABLE
 - frankly, it's pretty wonky that we allow that but treat GLOBAL as a
 noise word in this first place.  But I'm a little disinclined to have
 the message speculate about what might happen in future versions of
 PostgreSQL.  Such predictions don't have a very good track record of
 being accurate.

Agreed.

We should make use of GLOBAL throw an ERROR: feature not yet
implemented, in preparation for what might one day happen. We don't
know the future but we do know the present.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread David Fetter
Kudos!

Well-earned :)

Cheers,
David.
On Thu, Jun 07, 2012 at 06:15:23PM -0400, Tom Lane wrote:
 I am pleased to announce that Kevin Grittner has accepted the core
 committee's invitation to become our newest committer.  As you all
 know, Kevin's done a good deal of work on the project over the past
 couple of years.  We judged that he has the requisite skills,
 dedication to the project, and a suitable degree of caution to be
 a good committer.  Please join me in welcoming him aboard.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread David E. Wheeler
On Jun 7, 2012, at 3:15 PM, Tom Lane wrote:

 Please join me in welcoming him aboard.

Woo, congrats, Kevin!

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checkpointer on hot standby runs without looking checkpoint_segments

2012-06-08 Thread Simon Riggs
On 8 June 2012 18:01, Florian Pflug f...@phlo.org wrote:

 What would be much better, IMHO, is if the number of retained
 segments could ratchet down when the system is idle, eventually
 reaching a state where we keep only one segment beyond the one
 currently in use.

 I'm a bit sceptical about this. It seems to me that you wouldn't actually
 be able to do anything useful with the conserved space, since postgres
 could re-claim it at any time. At which point it'd better be available,
 or your whole cluster comes to a screeching halt...

Agreed, I can't really see why you'd want to save space when the
database is slow at the expense of robustness and reliability when the
database speeds up.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Daniel Farina
On Thu, Jun 7, 2012 at 3:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I am pleased to announce that Kevin Grittner has accepted the core
 committee's invitation to become our newest committer.

I have 99 problems, but this ain't one.[0]

[0]: This is a song reference.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporary tables under hot standby

2012-06-08 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 On 8 June 2012 18:26, Robert Haas robertmh...@gmail.com wrote:
 
 I would be more open to warning people about CREATE GLOBAL TEMP
 TABLE - frankly, it's pretty wonky that we allow that but treat
 GLOBAL as a noise word in this first place.  But I'm a little
 disinclined to have the message speculate about what might happen
 in future versions of PostgreSQL.  Such predictions don't have a
 very good track record of being accurate.
 
 Agreed.
 
 We should make use of GLOBAL throw an ERROR: feature not yet
 implemented, in preparation for what might one day happen. We
 don't know the future but we do know the present.
 
+1
 
It has always bothered me that we support GLOBAL there without
coming anywhere near matching the semantics of GTTs.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Robert Haas
On Fri, Jun 8, 2012 at 1:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
 OK.  So what I'm thinking is that we should add a new function that
 takes a relfilenode and a buffer and steps 4-6 of what's described in
 transam/README: mark the buffer dirty, xlog it, and set the LSN and
 TLI.  We might want to have this function assert that it is in a
 critical section, for the avoidance of error.  Then anyone who wants
 to use it can do steps 1-3, call the function, and then finish up with
 steps 6-7.  I don't think we can cleanly encapsulate any more than
 that.

 On further review, I think that we ought to make MarkBufferDirty() the
 caller's job, because sometimes we may need to xlog only if
 XLogIsNeeded(), but the buffer's got to get marked dirty either way.
 So I think the new function should just do step 5 - emit XLOG and set
 LSN/TLI.

 Proposed patch attached.

Whee, testing is fun.  Second try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


log-newpage-buffer-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Christopher Browne
+1 indeed.

Very pleased to see this progression in the development team!


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Fujii Masao
On Fri, Jun 8, 2012 at 7:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I am pleased to announce that Kevin Grittner has accepted the core
 committee's invitation to become our newest committer.

Wow! Congrats, Kevin!

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Postgres committer: Kevin Grittner

2012-06-08 Thread Vik Reykja
On Thu, Jun 7, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I am pleased to announce that Kevin Grittner has accepted the core
 committee's invitation to become our newest committer.  As you all
 know, Kevin's done a good deal of work on the project over the past
 couple of years.  We judged that he has the requisite skills,
 dedication to the project, and a suitable degree of caution to be
 a good committer.  Please join me in welcoming him aboard.


Congrats, Kevin!  I think you'll make a wonderful addition to the core
team.


Re: [HACKERS] log_newpage header comment

2012-06-08 Thread Amit kapila
On further review, I think that we ought to make MarkBufferDirty() the
caller's job, because sometimes we may need to xlog only if
XLogIsNeeded(), but the buffer's got to get marked dirty either way.

Incase the place where Xlog is not required, woudn't it fsync the data;
So in that case even MarkBufferDirty() will also be not required.

So I think the new function should just do step 5 - emit XLOG and set
LSN/TLI.

In the API log_newpage_buffer(), as buffer already contains the page to be 
logged, so can't it be assumed that the page will be initialized and no need to 
check
if PageIsNew before setting LSN/TLI.


From: pgsql-hackers-ow...@postgresql.org [pgsql-hackers-ow...@postgresql.org] 
on behalf of Robert Haas [robertmh...@gmail.com]
Sent: Friday, June 08, 2012 10:50 PM
To: Tom Lane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] log_newpage header comment

On Fri, Jun 8, 2012 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
 OK.  So what I'm thinking is that we should add a new function that
 takes a relfilenode and a buffer and steps 4-6 of what's described in
 transam/README: mark the buffer dirty, xlog it, and set the LSN and
 TLI.  We might want to have this function assert that it is in a
 critical section, for the avoidance of error.  Then anyone who wants
 to use it can do steps 1-3, call the function, and then finish up with
 steps 6-7.  I don't think we can cleanly encapsulate any more than
 that.

On further review, I think that we ought to make MarkBufferDirty() the
caller's job, because sometimes we may need to xlog only if
XLogIsNeeded(), but the buffer's got to get marked dirty either way.
So I think the new function should just do step 5 - emit XLOG and set
LSN/TLI.

Proposed patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers