Re: [HACKERS] Command to prune archive at restartpoints

2010-06-14 Thread Dimitri Fontaine
Fujii Masao masao.fu...@gmail.com writes:
 In SR, WAL files in the pg_xlog directory on the standby are recycled
 by every restartpoints. So your proposed function seems not to be helpful
 even if hot_standby = on.

Then I guess I'm at a loss here: what is the pg_archivecleanup utility
good for in a standby?

-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 12:21 +0200, Dimitri Fontaine wrote:
 Fujii Masao masao.fu...@gmail.com writes:
  In SR, WAL files in the pg_xlog directory on the standby are recycled
  by every restartpoints. So your proposed function seems not to be helpful
  even if hot_standby = on.
 
 Then I guess I'm at a loss here: what is the pg_archivecleanup utility
 good for in a standby?

Cleaning the archive directory, not the pg_xlog directory.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-14 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes:
 Cleaning the archive directory, not the pg_xlog directory.

Hence the choice of the directory where to act. I was slow on that,
sorry guys.

I guess my main problem here is that I still picture PostgreSQL has
being able to maintain an archive with no external script in the simple
case.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Thu, Jun 10, 2010 at 4:09 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Here's the code.

I haven't more than glanced at this, but +1 for committing it if
you're confident it DTRT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Sat, Jun 12, 2010 at 4:51 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Dimitri Fontaine dfonta...@hi-media.com writes:
 Also, should I try to send a patch implementing my proposal (internal
 command exposed as a function at the SQL level, and while at it, maybe
 the internal command pg_archive_bypass to mimic /usr/bin/true as an
 archive_command)?

 I had to have a try at it, even if quick and dirty. I've not tried to
 code the pg_archive_bypass internal command for lack of discussion, but
 I still think it would be great to have it.

 So here's a see my idea in code patch, that put the previous code by
 Simon into a backend function. As the goal was not to adapt the existing
 code intended as external to use the internal APIs, you'll find it quite
 ugly I'm sure.

 For example, this #define XLOG_DATA_FNAME_LEN has to go away, but that
 won't help having the idea accepted or not, and as I'm only warming up,
 I didn't tackle the problem. If you want me to do it, I'd appreciate
 some guidance as how to, though.

 It goes like this:

 dim=# select pg_switch_xlog();
  pg_switch_xlog
 
  0/198
 (1 row)

 dim=# select pg_archive_cleanup('0/198');
 DEBUG:  removing pg_xlog/0001
 DEBUG:  removing pg_xlog/00010001
  pg_archive_cleanup
 
  t
 (1 row)

 I hope you too will find this way of interfacing is easier to deal with
 for everybody (from code maintenance to user settings).

I'm a bit perplexed here.  The archive cleanup has to run on the
standby, not the master, right?  Whereas pg_switch_xlog() can only run
on the master.  The purpose of making this a standalone executable is
so that people who have, for example, multiple standbys, can customize
the logic without having to hack the backend.  Pushing this into the
backend would defeat that goal; plus, it wouldn't be usable at all for
people who aren't running Hot Standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I'm a bit perplexed here.  The archive cleanup has to run on the
 standby, not the master, right?  Whereas pg_switch_xlog() can only run
 on the master.

I used it just to show a possible use case, easy to grasp. Sorry if
that's confusing instead.

  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

In the simple cases, what you want to be able to easily choose is just
the first XLOG file you're NOT cleaning. And this is the only argument
you give the function. 

So you can either use the backend function as your internal command for
archive cleanup, or use a script that choose where to stop cleaning then
call it with that as an argument (it's SQL callable).

What it does is unlink the file. If that behavior doesn't suit you, it's
still possible to use an external command and tune some already proposed
scripts. I just don't see how an external binary has more to offer than
a backend function here. It's more code to maintain, it's harder to
setup for people, and if it does not suit you, you still have to make
you own script but you can not use what we ship easily (you have to get
the sources and code in C for that).

What I'm after is being able to tell people to just setup a GUC to a
given value, not to copy/paste a (perl or bash) script from the docs,
make it executable under their system, then test it and run it in
production. We can do better than that, and it's not even hard.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Sun, Jun 13, 2010 at 1:04 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm a bit perplexed here.  The archive cleanup has to run on the
 standby, not the master, right?  Whereas pg_switch_xlog() can only run
 on the master.

 I used it just to show a possible use case, easy to grasp. Sorry if
 that's confusing instead.

  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

 In the simple cases, what you want to be able to easily choose is just
 the first XLOG file you're NOT cleaning. And this is the only argument
 you give the function.

 So you can either use the backend function as your internal command for
 archive cleanup, or use a script that choose where to stop cleaning then
 call it with that as an argument (it's SQL callable).

 What it does is unlink the file. If that behavior doesn't suit you, it's
 still possible to use an external command and tune some already proposed
 scripts. I just don't see how an external binary has more to offer than
 a backend function here. It's more code to maintain, it's harder to
 setup for people, and if it does not suit you, you still have to make
 you own script but you can not use what we ship easily (you have to get
 the sources and code in C for that).

 What I'm after is being able to tell people to just setup a GUC to a
 given value, not to copy/paste a (perl or bash) script from the docs,
 make it executable under their system, then test it and run it in
 production. We can do better than that, and it's not even hard.

We're not going to make them cut/paste anything from the docs.  We're
going to provide a production-ready executable they can just use,
which should be installed (presumably, already with the correct
permissions) by their packaging system if they install
postgresql-contrib or the equivalent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Robert Haas robertmh...@gmail.com writes:
  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

 We're not going to make them cut/paste anything from the docs.  We're
 going to provide a production-ready executable they can just use,
 which should be installed (presumably, already with the correct
 permissions) by their packaging system if they install
 postgresql-contrib or the equivalent.

I still run against people not wanting to trust contrib. I still read
here from time to time that contrib's chapter is maintaining working
examples of extensibility, not maintaining production ready add-ons.

Other than that, you proposed something flexible and easy to customize,
and you end up with an executable binary that will only offer one
behavior (unlink), the only option is where to stop (%r).

The backend function I'm proposing uses the same option, but is easier
to call from a script, should you need to customize. You don't even have
to run the script locally or remember where is the XLOG directory of
that instance. You could operate over a JDBC connection, e.g.

I now realize that my proposal ain't helping if Streaming Replication is
filling the standby's pg_xlog and hot_standby = off. I don't remember
that SR rebuilds pg_xlog on the standby though, does it?

The proposed script will only cleanup XLOGDIR in fact, so if you use a
common archive elsewhere then you still need some external command not
provided by the project. So we still need the script example in the
docs.

I think that the pg_archivecleanup binary is a good solution, all the
more if not shipped in contrib, but that the SQL callable function is
better.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Andrew Dunstan



Dimitri Fontaine wrote:

I still read
here from time to time that contrib's chapter is maintaining working
examples of extensibility, not maintaining production ready add-ons.

  


Even if this were true, and I don't believe it is, ISTM the solution 
would be to have a utility command alongside the other utility commands 
like pg_controldata.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Fujii Masao
On Mon, Jun 14, 2010 at 3:51 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 I now realize that my proposal ain't helping if Streaming Replication is
 filling the standby's pg_xlog and hot_standby = off. I don't remember
 that SR rebuilds pg_xlog on the standby though, does it?

In SR, WAL files in the pg_xlog directory on the standby are recycled
by every restartpoints. So your proposed function seems not to be helpful
even if hot_standby = on.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-12 Thread Dimitri Fontaine
Dimitri Fontaine dfonta...@hi-media.com writes:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So to clean up all WAL files older than those needed by that base backup,
 you would simply copy-paste that location and call pg_cleanuparchive:

 pg_cleanuparchive /walarchive/ 0001002F

 Ok, idle though: what about having a superuser-only SRF doing the
 same?

So I'm looking at what it'd take to have that code, and it seems it
would be quite easy. I wonder if we want to return only a boolean
(command success status) or the list of files we're pruning (that's the
SRF part), but other than that, it's all about having the code provided
by Simon in another place and some internal command support.

Something strange though: I notice that the error and signal handling in
pgarch.c::pgarch_archiveXlog (lines 551 and following) and in
xlog.c::ExecuteRecoveryCommand (lines 3143 and following) are very
different for no reason that I can see.

Why is that?

Also, should I try to send a patch implementing my proposal (internal
command exposed as a function at the SQL level, and while at it, maybe
the internal command pg_archive_bypass to mimic /usr/bin/true as an
archive_command)?

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-12 Thread Dimitri Fontaine
Dimitri Fontaine dfonta...@hi-media.com writes:
 Also, should I try to send a patch implementing my proposal (internal
 command exposed as a function at the SQL level, and while at it, maybe
 the internal command pg_archive_bypass to mimic /usr/bin/true as an
 archive_command)?

I had to have a try at it, even if quick and dirty. I've not tried to
code the pg_archive_bypass internal command for lack of discussion, but
I still think it would be great to have it.

So here's a see my idea in code patch, that put the previous code by
Simon into a backend function. As the goal was not to adapt the existing
code intended as external to use the internal APIs, you'll find it quite
ugly I'm sure.

For example, this #define XLOG_DATA_FNAME_LEN has to go away, but that
won't help having the idea accepted or not, and as I'm only warming up,
I didn't tackle the problem. If you want me to do it, I'd appreciate
some guidance as how to, though.

It goes like this:

dim=# select pg_switch_xlog();
 pg_switch_xlog 

 0/198
(1 row)

dim=# select pg_archive_cleanup('0/198');
DEBUG:  removing pg_xlog/0001
DEBUG:  removing pg_xlog/00010001
 pg_archive_cleanup 

 t
(1 row)

I hope you too will find this way of interfacing is easier to deal with
for everybody (from code maintenance to user settings).

Regards,
-- 
dim

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 599,605  static bool read_backup_label(XLogRecPtr *checkPointLoc);
  static void rm_redo_error_callback(void *arg);
  static int	get_sync_bit(int method);
  
- 
  /*
   * Insert an XLOG record having the specified RMID and info bytes,
   * with the body of the record being the data chunk(s) described by
--- 599,604 
***
*** 3056,3062  not_available:
  }
  
  /*
!  * Attempt to execute an external shell command during recovery.
   *
   * 'command' is the shell command to be executed, 'commandName' is a
   * human-readable name describing the command emitted in the logs. If
--- 3055,3065 
  }
  
  /*
!  * Attempt to execute an external shell command during recovery, or an
!  * internal one if given one.
!  *
!  * There's only one supported internal commands today, which is named
!  * pg_archive_cleanup.
   *
   * 'command' is the shell command to be executed, 'commandName' is a
   * human-readable name describing the command emitted in the logs. If
***
*** 3094,3145  ExecuteRecoveryCommand(char *command, char *commandName, bool failOnSignal)
  	LWLockRelease(ControlFileLock);
  
  	/*
! 	 * construct the command to be executed
  	 */
! 	dp = xlogRecoveryCmd;
! 	endp = xlogRecoveryCmd + MAXPGPATH - 1;
! 	*endp = '\0';
  
! 	for (sp = command; *sp; sp++)
  	{
! 		if (*sp == '%')
  		{
! 			switch (sp[1])
  			{
! case 'r':
! 	/* %r: filename of last restartpoint */
! 	sp++;
! 	StrNCpy(dp, lastRestartPointFname, endp - dp);
! 	dp += strlen(dp);
! 	break;
! case '%':
! 	/* convert %% to a single % */
! 	sp++;
! 	if (dp  endp)
! 		*dp++ = *sp;
! 	break;
! default:
! 	/* otherwise treat the % as not special */
! 	if (dp  endp)
! 		*dp++ = *sp;
! 	break;
  			}
  		}
! 		else
! 		{
! 			if (dp  endp)
! *dp++ = *sp;
! 		}
! 	}
! 	*dp = '\0';
  
! 	ereport(DEBUG3,
! 			(errmsg_internal(executing %s \%s\, commandName, command)));
  
- 	/*
- 	 * execute the constructed command
- 	 */
- 	rc = system(xlogRecoveryCmd);
  	if (rc != 0)
  	{
  		/*
--- 3097,3164 
  	LWLockRelease(ControlFileLock);
  
  	/*
! 	 * if the command is internal, call the function
  	 */
! 	if( strcmp(command, pg_archive_cleanup) == 0 )
! 	{
! 		text *restart_filename = cstring_to_text(lastRestartPointFname);
! 		rc = DatumGetBool(DirectFunctionCall1(pg_archive_cleanup, 
! 			  restart_filename)) ? 0 : 1;
  
! 		ereport(DEBUG3,
! (errmsg_internal(calling %s \%s\, commandName, command)));
! 	}
! 	else
  	{
! 		/*
! 		 * construct the command to be executed
! 		 */
! 		dp = xlogRecoveryCmd;
! 		endp = xlogRecoveryCmd + MAXPGPATH - 1;
! 		*endp = '\0';
! 
! 		for (sp = command; *sp; sp++)
  		{
! 			if (*sp == '%')
  			{
! switch (sp[1])
! {
! 	case 'r':
! 		/* %r: filename of last restartpoint */
! 		sp++;
! 		StrNCpy(dp, lastRestartPointFname, endp - dp);
! 		dp += strlen(dp);
! 		break;
! 	case '%':
! 		/* convert %% to a single % */
! 		sp++;
! 		if (dp  endp)
! 			*dp++ = *sp;
! 		break;
! 	default:
! 		/* otherwise treat the % as not special */
! 		if (dp  endp)
! 			*dp++ = *sp;
! 		break;
! }
! 			}
! 			else
! 			{
! if (dp  endp)
! 	*dp++ = *sp;
  			}
  		}
! 		*dp = '\0';
  
! 		ereport(DEBUG3,
! (errmsg_internal(executing %s \%s\, commandName, command)));
! 
! 		/*
! 		 * execute the constructed command
! 		 */
! 		rc = system(xlogRecoveryCmd);
! 	}

Re: [HACKERS] Command to prune archive at restartpoints

2010-06-11 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So to clean up all WAL files older than those needed by that base backup,
 you would simply copy-paste that location and call pg_cleanuparchive:

 pg_cleanuparchive /walarchive/ 0001002F

Ok, idle though: what about having a superuser-only SRF doing the same?
So that we have internal command for simple case, and SRF for use in
scripts in more complex case.

 Of course, if there's a perl one-liner to do that, we can just put that in
 the docs and don't really need pg_cleanuparchive at all.

psql -c SELECT * FROM pg_cleanup_archive('0001002F');

 Therefore my take on this problem is to provide internal commands here,
 that maybe wouldn't need to be explicitly passed any argument. If
 they're internal they certainly can access to the information they need?

 You want more flexibility in more advanced cases. Like if you have multiple
 standbys sharing the archive, you only want to remove old WAL files after
 they're not needed by *any* of the standbys anymore. Doing the cleanup
 directly in the archive_cleanup_command would cause the old WAL files to be
 removed prematurely, but you could put a shell script there to store the
 location to a file, and call pg_cleanuparchive with the max() of the
 locations reported by all standby servers.

Yes you still need to support external commands. That was not at all
what I'm proposing: I'm just after having the simple case dead simple to
setup. Like you don't write any script.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 09/06/10 10:21, Simon Riggs wrote:

On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:


I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.

More importantly, we should include an example in the docs. I created
one the other day  when this was actually bothering me a bit (see
http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.


I submitted a patch to make the command pg_standby -a %r

That's a more portable solution, ISTM.

I'll commit that and fix the docs.


Huh, wait. There's no -a option in pg_standby, so I presume you're 
planning to add that too. I don't like confusing pg_standby into this, 
the docs are currently quite clear that if you want to use the built-in 
standby mode, you can't use pg_standby, and this would muddy the waters.


Maybe we could add a new pg_cleanuparchive binary, but we'll need some 
discussion...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Simon Riggs
On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote:
 On 09/06/10 10:21, Simon Riggs wrote:
  On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
 
  I prefer archive_cleanup_command. We should name things after their
  principal function, not an implementation detail, IMNSHO.
 
  More importantly, we should include an example in the docs. I created
  one the other day  when this was actually bothering me a bit (see
  http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html).
  That seemed to work ok, but maybe it's too long, and maybe people would
  prefer a shell script to perl.
 
  I submitted a patch to make the command pg_standby -a %r
 
  That's a more portable solution, ISTM.
 
  I'll commit that and fix the docs.
 
 Huh, wait. There's no -a option in pg_standby, so I presume you're 
 planning to add that too. I don't like confusing pg_standby into this, 
 the docs are currently quite clear that if you want to use the built-in 
 standby mode, you can't use pg_standby, and this would muddy the waters.

It won't kill us to change that sentence. pg_standby is only used now
within the cleanup command etc

pg_standby already contains the exact logic we need here. Having two
sets of code for the same thing isn't how we do things.

 Maybe we could add a new pg_cleanuparchive binary, but we'll need some 
 discussion...

Which will go nowhere, as we both already know.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote:
 On 09/06/10 10:21, Simon Riggs wrote:
  On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
 
  I prefer archive_cleanup_command. We should name things after their
  principal function, not an implementation detail, IMNSHO.
 
  More importantly, we should include an example in the docs. I created
  one the other day  when this was actually bothering me a bit (see
  http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html).
  That seemed to work ok, but maybe it's too long, and maybe people would
  prefer a shell script to perl.
 
  I submitted a patch to make the command pg_standby -a %r
 
  That's a more portable solution, ISTM.
 
  I'll commit that and fix the docs.

 Huh, wait. There's no -a option in pg_standby, so I presume you're
 planning to add that too. I don't like confusing pg_standby into this,
 the docs are currently quite clear that if you want to use the built-in
 standby mode, you can't use pg_standby, and this would muddy the waters.

 It won't kill us to change that sentence. pg_standby is only used now
 within the cleanup command etc

 pg_standby already contains the exact logic we need here. Having two
 sets of code for the same thing isn't how we do things.

 Maybe we could add a new pg_cleanuparchive binary, but we'll need some
 discussion...

 Which will go nowhere, as we both already know.

I have a feeling that I may be poking my nose into an incipient
shouting match, but FWIW I agree with Heikki that it would be
preferable to keep this separate from pg_standby.  Considering that
Andrew wrote this in 24 lines of Perl code (one-third of which are
basically just there for logging purposes), I'm not that worried about
code duplication, unless what we actually need is significantly more
complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Andrew Dunstan



Robert Haas wrote:

It won't kill us to change that sentence. pg_standby is only used now
within the cleanup command etc

pg_standby already contains the exact logic we need here. Having two
sets of code for the same thing isn't how we do things.



Well, we could factor out that part of the code so it could be used in 
two binaries. But ...



Maybe we could add a new pg_cleanuparchive binary, but we'll need some
discussion...
  

Which will go nowhere, as we both already know.



I have a feeling that I may be poking my nose into an incipient
shouting match, but FWIW I agree with Heikki that it would be
preferable to keep this separate from pg_standby.  Considering that
Andrew wrote this in 24 lines of Perl code (one-third of which are
basically just there for logging purposes), I'm not that worried about
code duplication, unless what we actually need is significantly more
complicated.

  


I think my logic needs a tiny piece of adjustment, to ignore the 
timeline segment of the file name. But that will hardly involve a great 
deal of extra code - just chop off the first 8 chars. It's not like the 
code for this in pg_standby.c is terribly complex.


The virtue of a perl script is that it's very easily customizable, e.g. 
you might only delete files if they are older than a certain age.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 17:38, Andrew Dunstan wrote:

I think my logic needs a tiny piece of adjustment, to ignore the
timeline segment of the file name.


I'm not sure you should ignore it. Presumably anything in an older 
timeline is indeed not required anymore and can be removed, and anything 
in a newer timeline... how did it get there? Seems safer not remove it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Andrew Dunstan



Heikki Linnakangas wrote:

On 10/06/10 17:38, Andrew Dunstan wrote:

I think my logic needs a tiny piece of adjustment, to ignore the
timeline segment of the file name.


I'm not sure you should ignore it. Presumably anything in an older 
timeline is indeed not required anymore and can be removed, and 
anything in a newer timeline... how did it get there? Seems safer not 
remove it.




Well, I was just following the logic in pg-standby.c:

   /*
* We ignore the timeline part of the XLOG segment 
identifiers

* in deciding whether a segment is still needed.  This
* ensures that we won't prematurely remove a segment from a
* parent timeline. We could probably be a little more
* proactive about removing segments of non-parent 
timelines,

* but that would be a whole lot more complicated.
*
* We use the alphanumeric sorting property of the filenames
* to decide which ones are earlier than the
* exclusiveCleanupFileName file. Note that this means files
* are not removed in the order they were originally 
written,

* in case this worries you.
*/
   if (strlen(xlde-d_name) == XLOG_DATA_FNAME_LEN 
   strspn(xlde-d_name, 0123456789ABCDEF)
   == XLOG_DATA_FNAME_LEN 
 strcmp(xlde-d_name + 8, exclusiveCleanupFileName + 8) 
 0)



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 22:24, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Maybe we could add a new pg_cleanuparchive binary, but we'll need some
discussion...


Would this binary ever be used manually, not invoked by PostgreSQL? As
it depends on the %r option to be given and to be right, I don't think
so.


Hmm, actually it would be pretty handy. To make use of a base backup, 
you need all the WAL files following the one where pg_start_backup() was 
called. We create a .backup file in the archive to indicate that 
location, like:


0001002F.0020.backup

So to clean up all WAL files older than those needed by that base 
backup, you would simply copy-paste that location and call 
pg_cleanuparchive:


pg_cleanuparchive /walarchive/ 0001002F

Of course, if there's a perl one-liner to do that, we can just put that 
in the docs and don't really need pg_cleanuparchive at all.



Therefore my take on this problem is to provide internal commands here,
that maybe wouldn't need to be explicitly passed any argument. If
they're internal they certainly can access to the information they need?


You want more flexibility in more advanced cases. Like if you have 
multiple standbys sharing the archive, you only want to remove old WAL 
files after they're not needed by *any* of the standbys anymore. Doing 
the cleanup directly in the archive_cleanup_command would cause the old 
WAL files to be removed prematurely, but you could put a shell script 
there to store the location to a file, and call pg_cleanuparchive with 
the max() of the locations reported by all standby servers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Maybe we could add a new pg_cleanuparchive binary, but we'll need some
 discussion...

Would this binary ever be used manually, not invoked by PostgreSQL? As
it depends on the %r option to be given and to be right, I don't think
so.

Therefore my take on this problem is to provide internal commands here,
that maybe wouldn't need to be explicitly passed any argument. If
they're internal they certainly can access to the information they need?

As a user, I'd find it so much better to trust PostgreSQL for proposing
sane defaults. As a developer, you will certainly find it easier to
maintain, document and distribute.

While at it, the other internal command we need is pg_archive_bypass for
the archive_command so that windows users have the /usr/bin/true option
too.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Simon Riggs
On Thu, 2010-06-10 at 22:49 +0300, Heikki Linnakangas wrote:
 On 10/06/10 22:24, Dimitri Fontaine wrote:
  Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:
  Maybe we could add a new pg_cleanuparchive binary, but we'll need some
  discussion...
 
  Would this binary ever be used manually, not invoked by PostgreSQL? As
  it depends on the %r option to be given and to be right, I don't think
  so.
 
 Hmm, actually it would be pretty handy. To make use of a base backup, 
 you need all the WAL files following the one where pg_start_backup() was 
 called. We create a .backup file in the archive to indicate that 
 location, like:
 
 0001002F.0020.backup
 
 So to clean up all WAL files older than those needed by that base 
 backup, you would simply copy-paste that location and call 
 pg_cleanuparchive:
 
 pg_cleanuparchive /walarchive/ 0001002F

OK, sounds like we're on the same thought train.

Here's the code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


pg_archivecleanup.tar
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-09 Thread Simon Riggs
On Wed, 2010-06-09 at 11:18 +0900, Takahiro Itagaki wrote:
 Robert Haas robertmh...@gmail.com wrote:
  I think we're replacing restartpoint_command, not recovery_end_command.
 
 Ah, sorry. I did the same replacement for restartpoint_command
 in _, -, and camel case words.
 
 BTW, should we also have a release note for the command?
 I added a simple description for it in the patch.

I don't think so, its not a separate feature. It's a change as part of
SR.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-09 Thread Simon Riggs
On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:

 I prefer archive_cleanup_command. We should name things after their 
 principal function, not an implementation detail, IMNSHO.
 
 More importantly, we should include an example in the docs. I created 
 one the other day  when this was actually bothering me a bit (see 
 http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html).
  
 That seemed to work ok, but maybe it's too long, and maybe people would 
 prefer a shell script to perl.

I submitted a patch to make the command pg_standby -a %r

That's a more portable solution, ISTM.

I'll commit that and fix the docs.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Robert Haas
On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark gsst...@mit.edu wrote:
 On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:

 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.

 Would it be better to call this archive_cleanup_command? That might
 help people understand the need for and the use of this parameter.

 This is bikeshedding but fwiw I like Simon's suggestion.

So, this thread is hanging out on our list of open items for 9.0.  My
personal opinion on it is that I don't really care much one way or the
other.  archive_cleanup_command does seem easier to understand, but
restartpoint_command has the advantage of describing exactly when it
gets run from a technical perspective, which might be a good thing,
too.  Since nobody's felt motivated to do anything about this for two
and a half months and we've now been through two betas with it the way
it is, I'm inclined to say we should just leave it alone.  On the
other hand, both of the people who voted in favor of changing it are
committers, and if one of them feels like putting in the effort to
change it, it won't bother me much, except that I feel it should get
done RSN.  But one way or the other we need to make a decision and get
this off the list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Simon Riggs
On Tue, 2010-06-08 at 17:17 -0400, Robert Haas wrote:
 On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark gsst...@mit.edu wrote:
  On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
 
  One awkward omission in the new built-in standby mode, mainly used for
  streaming replication, is that there is no easy way to delete old
  archived files like you do with the %r parameter to restore_command.
 
  Would it be better to call this archive_cleanup_command? That might
  help people understand the need for and the use of this parameter.
 
  This is bikeshedding but fwiw I like Simon's suggestion.
 
 So, this thread is hanging out on our list of open items for 9.0.  My
 personal opinion on it is that I don't really care much one way or the
 other.  archive_cleanup_command does seem easier to understand, but
 restartpoint_command has the advantage of describing exactly when it
 gets run from a technical perspective, which might be a good thing,
 too.  Since nobody's felt motivated to do anything about this for two
 and a half months and we've now been through two betas with it the way
 it is, I'm inclined to say we should just leave it alone.  On the
 other hand, both of the people who voted in favor of changing it are
 committers, and if one of them feels like putting in the effort to
 change it, it won't bother me much, except that I feel it should get
 done RSN.  But one way or the other we need to make a decision and get
 this off the list.

Yes, restartpoint_command is exactly correct, and I do understand it; I
just don't think anyone else will. If there's another use for a
restartpoint_command other than for clearing up an archive, then it
would be sufficient to destroy the name change idea. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Andrew Dunstan



Robert Haas wrote:

On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark gsst...@mit.edu wrote:
  

On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote:


On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:

  

One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.


Would it be better to call this archive_cleanup_command? That might
help people understand the need for and the use of this parameter.
  

This is bikeshedding but fwiw I like Simon's suggestion.



So, this thread is hanging out on our list of open items for 9.0.  My
personal opinion on it is that I don't really care much one way or the
other.  archive_cleanup_command does seem easier to understand, but
restartpoint_command has the advantage of describing exactly when it
gets run from a technical perspective, which might be a good thing,
too.  Since nobody's felt motivated to do anything about this for two
and a half months and we've now been through two betas with it the way
it is, I'm inclined to say we should just leave it alone.  On the
other hand, both of the people who voted in favor of changing it are
committers, and if one of them feels like putting in the effort to
change it, it won't bother me much, except that I feel it should get
done RSN.  But one way or the other we need to make a decision and get
this off the list.

  


I prefer archive_cleanup_command. We should name things after their 
principal function, not an implementation detail, IMNSHO.


More importantly, we should include an example in the docs. I created 
one the other day  when this was actually bothering me a bit (see 
http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html). 
That seemed to work ok, but maybe it's too long, and maybe people would 
prefer a shell script to perl.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I prefer archive_cleanup_command. We should name things after their 
 principal function, not an implementation detail, IMNSHO.

Weak preference for archive_cleanup_command here.

 More importantly, we should include an example in the docs. I created 
 one the other day  when this was actually bothering me a bit (see 
 http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html).
  
 That seemed to work ok, but maybe it's too long, and maybe people would 
 prefer a shell script to perl.

Short is good.  Maybe you could remove the logging stuff from the
example.

As for the language choice, my first thought is +1 for perl over shell,
mainly because it might be directly useful to people on Windows while
shell never would be.  On the other hand, if it's possible to do a
useful one-liner in shell then let's do it that way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Andrew Dunstan



Tom Lane wrote:

As for the language choice, my first thought is +1 for perl over shell,
mainly because it might be directly useful to people on Windows while
shell never would be.  On the other hand, if it's possible to do a
useful one-liner in shell then let's do it that way.
  


I don't think it is, reasonably. But here is fairly minimal version that 
might suit the docs:


   use strict;
   my ($dir, $num) = @ARGV;
   foreach my $file (glob($dir/*))
   {
   my $name = basename($file);
   unlink $file if (-f $file  $name =~ /^[0-9A-Z]{24}$/  $name lt 
$num);
   }
 




cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Robert Haas
On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 I prefer archive_cleanup_command. We should name things after their
 principal function, not an implementation detail, IMNSHO.

 Weak preference for archive_cleanup_command here.

OK, sounds like we have consensus on that.  Who wants to do it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Takahiro Itagaki

Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andrew Dunstan and...@dunslane.net writes:
  I prefer archive_cleanup_command. We should name things after their
  principal function, not an implementation detail, IMNSHO.
 
  Weak preference for archive_cleanup_command here.
 
 OK, sounds like we have consensus on that.  Who wants to do it?

Do we just need to replace all of them? If so, patch attached.
I replaced 3 terms: recovery_end_command, recovery-end-command,
and recoveryEndCommand.


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



archive_cleanup_command.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Robert Haas
On Tue, Jun 8, 2010 at 9:45 PM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:
 Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andrew Dunstan and...@dunslane.net writes:
  I prefer archive_cleanup_command. We should name things after their
  principal function, not an implementation detail, IMNSHO.
 
  Weak preference for archive_cleanup_command here.

 OK, sounds like we have consensus on that.  Who wants to do it?

 Do we just need to replace all of them? If so, patch attached.
 I replaced 3 terms: recovery_end_command, recovery-end-command,
 and recoveryEndCommand.

I think we're replacing restartpoint_command, not recovery_end_command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Fujii Masao
On Wed, Jun 9, 2010 at 10:45 AM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:

 Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andrew Dunstan and...@dunslane.net writes:
  I prefer archive_cleanup_command. We should name things after their
  principal function, not an implementation detail, IMNSHO.
 
  Weak preference for archive_cleanup_command here.

 OK, sounds like we have consensus on that.  Who wants to do it?

 Do we just need to replace all of them? If so, patch attached.
 I replaced 3 terms: recovery_end_command, recovery-end-command,
 and recoveryEndCommand.

s/recovery_end_command/restartpoint_command?

I prefer restartpoint_command over archive_cleanup_command because
not only restartpoint_command but also recovery_end_command is used
for archive cleanup.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Takahiro Itagaki

Robert Haas robertmh...@gmail.com wrote:
 I think we're replacing restartpoint_command, not recovery_end_command.

Ah, sorry. I did the same replacement for restartpoint_command
in _, -, and camel case words.

BTW, should we also have a release note for the command?
I added a simple description for it in the patch.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



estartpoint-to-archive_cleanup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-08 Thread Robert Haas
On Tue, Jun 8, 2010 at 10:18 PM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:

 Robert Haas robertmh...@gmail.com wrote:
 I think we're replacing restartpoint_command, not recovery_end_command.

 Ah, sorry. I did the same replacement for restartpoint_command
 in _, -, and camel case words.

Gah.  Perhaps one of these days we will stop spelling every identifier
in multiple different ways.

 BTW, should we also have a release note for the command?
 I added a simple description for it in the patch.

Yeah, it should be definitely mentioned in the release notes somewhere, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-22 Thread Greg Stark
On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:

 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.

 Would it be better to call this archive_cleanup_command? That might
 help people understand the need for and the use of this parameter.

This is bikeshedding but fwiw I like Simon's suggestion.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-18 Thread Heikki Linnakangas
Committed.

Heikki Linnakangas wrote:
 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.
 This was discussed at
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
 other things.
 
 Per discussion, attached patch adds a new restartpoint_command option to
 recovery.conf. That's an external shell command just like
 recovery_end_command that's executed at every restartpoint. You can use
 the %r parameter to pass the filename of the oldest WAL file that needs
 to be retained.
 
 While developing this I noticed that %r in recovery_end_command is not
 working correctly:
 
 LOG:  redo done at 0/14000C10
 LOG:  last completed transaction was at log time 2000-01-01
 02:21:08.816445+02
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/00010014': No
 such file or directory
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/0002.history': No such file
 or directory
 LOG:  selected new timeline ID: 2
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/0001.history': No such file
 or directory
 LOG:  archive recovery complete
 LOG:  checkpoint starting: end-of-recovery immediate wait
 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
 total=0.003 s
 LOG:  executing recovery_end_command echo recovery_end_command %r
 recovery_end_command 
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 
 Note how %r is always expanded to . That's
 because %r is expanded only when InRedo is true, which makes sense for
 restore_command where that piece of code was copy-pasted from, but it's
 never true anymore when recovery_end_command is run. The attached patch
 fixes that too.
 
 Barring objections, I will commit this later today.
 
 


-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-18 Thread Simon Riggs
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:

 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.
 This was discussed at
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
 other things.

...

 Barring objections, I will commit this later today.

Would it be better to call this archive_cleanup_command? That might
help people understand the need for and the use of this parameter.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Command to prune archive at restartpoints

2010-03-17 Thread Heikki Linnakangas
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.
This was discussed at
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
other things.

Per discussion, attached patch adds a new restartpoint_command option to
recovery.conf. That's an external shell command just like
recovery_end_command that's executed at every restartpoint. You can use
the %r parameter to pass the filename of the oldest WAL file that needs
to be retained.

While developing this I noticed that %r in recovery_end_command is not
working correctly:

LOG:  redo done at 0/14000C10
LOG:  last completed transaction was at log time 2000-01-01
02:21:08.816445+02
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/00010014': No
such file or directory
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/0002.history': No such file
or directory
LOG:  selected new timeline ID: 2
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/0001.history': No such file
or directory
LOG:  archive recovery complete
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.003 s
LOG:  executing recovery_end_command echo recovery_end_command %r
recovery_end_command 
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

Note how %r is always expanded to . That's
because %r is expanded only when InRedo is true, which makes sense for
restore_command where that piece of code was copy-pasted from, but it's
never true anymore when recovery_end_command is run. The attached patch
fixes that too.

Barring objections, I will commit this later today.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 6404768..7b7a8bd 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -59,14 +59,15 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
- varlistentry id=recovery-end-command xreflabel=recovery_end_command
-  termvarnamerecovery_end_command/varname (typestring/type)/term
+ varlistentry id=restartpoint-command xreflabel=restartpoint_command
+  termvarnamerestartpoint_command/varname (typestring/type)/term
   listitem
para
-This parameter specifies a shell command that will be executed once only
-at the end of recovery. This parameter is optional. The purpose of the
-varnamerecovery_end_command/ is to provide a mechanism for cleanup
-following replication or recovery.
+This parameter specifies a shell command that will be executed at
+every restartpoint. This parameter is optional. The purpose of the
+varnamerestartpoint_command/ is to provide a mechanism for cleaning
+up old archived WAL files that are no longer needed by the standby
+server.
 Any literal%r/ is replaced by the name of the file
 containing the last valid restart point. That is the earliest file that
 must be kept to allow a restore to be restartable, so this information
@@ -79,6 +80,24 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
/para
para
 If the command returns a non-zero exit status then a WARNING log
+message will be written.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry id=recovery-end-command xreflabel=recovery_end_command
+  termvarnamerecovery_end_command/varname (typestring/type)/term
+  listitem
+   para
+This parameter specifies a shell command that will be executed once only
+at the end of recovery. This parameter is optional. The purpose of the
+varnamerecovery_end_command/ is to provide a mechanism for cleanup
+following replication or recovery.
+Any literal%r/ is replaced by the name of the file containing the
+last valid restart point, like in xref linkend=restartpoint-command.
+   /para
+   para
+If the command returns a non-zero exit status then a WARNING log
 message will be written and the database will proceed to start up
 anyway.  An exception is that if the command was terminated by a
 signal, the database will not proceed with startup.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 995794a..519526e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -171,6 +171,7 @@ static bool restoredFromArchive = false;
 /* options taken from recovery.conf for archive recovery */
 static char 

Re: [HACKERS] Command to prune archive at restartpoints

2010-03-17 Thread Greg Stark
On Wed, Mar 17, 2010 at 9:37 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.

I'm still finding this kind of narrow-minded. I'm picturing a system
with multiple replicas -- obvious no one replica can take it upon
itself to delete archived log files based only on its own
restartpoint. And besides, if you're using the archived log files for
backups you also need to take into account the backup policy and only
delete files that aren't needed for a consistent backup and aren't
needed for the replica.

What we need is a program which can take all this information from all
your slaves and backup labels into account and implement your backup
policies. It probably won't exist in time for the release and in any
case doesn't really have to ship with Postgres. There might even be
more than one.

But do we have all the information that such a program would need? Is
there a way to connect to a replica and ask it what the restart point
is? I suppose with this new command you could always just make it a
command which wakes up this demon and sends it the restart point and
the replica id and it can update its internal state and recalculate
what archives are needed. It is a bit nerve-wracking that it's
dependent on its internal state remembering the restart points it's
been given though.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-17 Thread Heikki Linnakangas
Greg Stark wrote:
 On Wed, Mar 17, 2010 at 9:37 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.
 
 I'm still finding this kind of narrow-minded. I'm picturing a system
 with multiple replicas -- obvious no one replica can take it upon
 itself to delete archived log files based only on its own
 restartpoint. And besides, if you're using the archived log files for
 backups you also need to take into account the backup policy and only
 delete files that aren't needed for a consistent backup and aren't
 needed for the replica.

That's why we provide options that take any shell command you want,
rather than e.g a path to an archive directory that's pruned automatically.

For example, if you have multiple standbys sharing one archive, you
could do something like this:

In each standby, have a restartpoint_command along the lines of:
echo %r  archivedirectory/standby1_location; archive_cleanup.sh

Where '1' is different for every standby

and in archive_cleanup.sh, scan through all the standbyX_location files,
take the minimum, and delete all files smaller than that.

You'll need some care with locking etc., but the point is that the
current hooks allow you to implement complex setups like that.

 What we need is a program which can take all this information from all
 your slaves and backup labels into account and implement your backup
 policies. It probably won't exist in time for the release and in any
 case doesn't really have to ship with Postgres. There might even be
 more than one.

I guess I just described such a program :-). Yeah, I'd imagine that to
become part of toolkits like skytools.

 But do we have all the information that such a program would need? Is
 there a way to connect to a replica and ask it what the restart point
 is?

Hmm, Greg Smith opened a thread on exposing the fields in the control
file as user-defined functions. IIRC last restartpoint location was the
piece of information that triggered the discussion this time. Perhaps we
should indeed add a function to expose that in 9.0.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers