Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-07 Thread Chris Travers
On Thu, Sep 7, 2017 at 2:47 PM, Alvaro Herrera 
wrote:

> After reading this discussion, I agree that pg_rewind needs to become
> smarter in order to be fully useful in production environments; right
> now there are many warts and corner cases that did not seem to have been
> considered during the initial development (which I think is all right,
> taking into account that its mere concept was complicated enough; so we
> need not put any blame on the original developers, rather the contrary).
>

Agreed with this assessment.  And as a solution to the problem of "base
backups take too long to take and transfer" the solution and the corner
cases make a lot of sense.

>
> I think we need to make the program simple to use (i.e. not have the
> user write shell globs for the common log file naming patterns) while
> remaining powerful (i.e. not forcibly copy any files that do not match
> hardcoded globs).


I would add that well-defined tasks are a key aspect of powerful software
in my view and here the well defined task is to restore data states to a
particular usable timeline point taken from another system.  If that is
handled well, that opens up new uses and makes some problems that are
difficult right now much easier to solve.


>   Is the current dry-run mode enough to give the user
> peace of mind regarding what would be done in terms of testing globs
> etc?  If not, maybe the debug mode needs to be improved (for instance,
> have it report the file size for each file that would be copied;
> otherwise you may not notice it's going to copy the 20GB log file until
> it's too late ...)
>

The dry run facility solves one problem in one circumstance, namely a
manually invoked run of the software along with the question of "will this
actually re-wind?"  I suppose software developers might be able to use it
to backup and restore things that are to be clobbered (but is anyone likely
to on the software development side?).  I don't see anything in that corner
that can be improved without over engineering the solution.

There are two reasons I am skeptical that a dry-run mode will ever be
"enough."

The first is that pg_rewind is often integrated into auto-failover/back
tools and the chance of running it in a dry-run mode before it is
automatically triggered is more or less nil.  These are also the cases
where you won't notice it does something bad until much later.

The second is that there are at least some corner cases we may need to
define as outside the responsibility of pg_rewind.  The one that comes to
mind here is if I am rewinding back past the creation of a small table.  I
don't see an easy or safe way to address that from inside pg_rewind without
a lot of complication.  It might be better to have a dedicated tool for
that.


>
> Now, in order for any of this to happen, there will need to be a
> champion to define what the missing pieces are, write all those patches
> and see them through the usual (cumbersome, slow) procedure.  Are you,
> Chris, willing to take the lead on that?
>

 Yeah. I think the first step would be list of the corner cases and a
proposal for how I think it should work.  Then maybe a roadmap of patches,
and then submitting them as they become complete.


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-07 Thread Alvaro Herrera
After reading this discussion, I agree that pg_rewind needs to become
smarter in order to be fully useful in production environments; right
now there are many warts and corner cases that did not seem to have been
considered during the initial development (which I think is all right,
taking into account that its mere concept was complicated enough; so we
need not put any blame on the original developers, rather the contrary).

I think we need to make the program simple to use (i.e. not have the
user write shell globs for the common log file naming patterns) while
remaining powerful (i.e. not forcibly copy any files that do not match
hardcoded globs).  Is the current dry-run mode enough to give the user
peace of mind regarding what would be done in terms of testing globs
etc?  If not, maybe the debug mode needs to be improved (for instance,
have it report the file size for each file that would be copied;
otherwise you may not notice it's going to copy the 20GB log file until
it's too late ...)

Now, in order for any of this to happen, there will need to be a
champion to define what the missing pieces are, write all those patches
and see them through the usual (cumbersome, slow) procedure.  Are you,
Chris, willing to take the lead on that?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-07 Thread Chris Travers
One more side to this which is relevant to other discussions.

If I am rewinding back to before when a table was created, the current
algorithm as well as any proposed algorithms will delete the reference to
the relfilenode in the catalogs but not the file itself.  I don't see how
an undo subsystem would fix this.

Is this a reason to rethink the idea that maybe a pg_fsck utility might be
useful that could be run immediately after a rewind?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Vladimir Borodin

> 5 сент. 2017 г., в 15:48, Michael Paquier  
> написал(а):
> 
> On Tue, Sep 5, 2017 at 9:40 PM, Vladimir Borodin  wrote:
>> We do compress WALs and send them over network. Doing it via archive_command
>> in single thread is sometimes slower than new WALs are written under heavy
>> load.
> 
> Ah, yeah, true. I do use pg_receivexlog --compress for that locally
> and do a bulk copy of only the compressed WALs needed, when needed...
> So there is a guarantee that completed segments are durable locally,
> which is very useful.

It seems that option --compress appeared only in postgres 10 which is not ready 
for production yet. BTW I assume that pg_receivexlog is single-threaded too? So 
it still may be the bottleneck when 3-5 WALs per second are written.

> You should definitely avoid putting that in
> PGDATA though, the same counts for tablespaces within PGDATA for
> example.

I would love to but there might be some problems with archiving and in many 
cases the only partition with enough space to accumulate WALs is partition for 
PGDATA.

--
May the force be with you…
https://simply.name



Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Vladimir Borodin

> 5 сент. 2017 г., в 15:42, Chris Travers  написал(а):
> 
> On Tue, Sep 5, 2017 at 2:40 PM, Vladimir Borodin  > wrote:
> 
>> 5 сент. 2017 г., в 14:04, Michael Paquier > > написал(а):
>> 
>>> For example, in archive_command we put WALs for archiving from
>>> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
>>> task makes real archiving. This directory ideally should be skipped by
>>> pg_rewind, but it would not be handled by proposed change.
>> 
>> I would be curious to follow the reasoning for such a two-phase
>> archiving (You basically want to push it in two places, no? But why
>> not just use pg_receivexlog then?). This is complicated to handle from
>> the point of view of availability and backup reliability + durability.
> 
> We do compress WALs and send them over network. Doing it via archive_command 
> in single thread is sometimes slower than new WALs are written under heavy 
> load.
> 
> How would this work when it comes to rewinding against a file directory? 

Very bad, of course. Sometimes we get 'could not remove file 
"/var/lib/postgresql/9.6/data/wals/000100C300C6": No such file or 
directory’ while running pg_rewind ($PGDATA/wals is a directory where 
archive_command copies WALs). That’s why I want to solve the initial problem. 
Both proposed solutions (using only needed files and skipping files through 
glob/regex) are fine for me, but not the initial patch.

--
May the force be with you…
https://simply.name



Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Michael Paquier
On Tue, Sep 5, 2017 at 9:40 PM, Vladimir Borodin  wrote:
> We do compress WALs and send them over network. Doing it via archive_command
> in single thread is sometimes slower than new WALs are written under heavy
> load.

Ah, yeah, true. I do use pg_receivexlog --compress for that locally
and do a bulk copy of only the compressed WALs needed, when needed...
So there is a guarantee that completed segments are durable locally,
which is very useful. You should definitely avoid putting that in
PGDATA though, the same counts for tablespaces within PGDATA for
example.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 2:40 PM, Vladimir Borodin  wrote:

>
> 5 сент. 2017 г., в 14:04, Michael Paquier 
> написал(а):
>
> For example, in archive_command we put WALs for archiving from
> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> task makes real archiving. This directory ideally should be skipped by
> pg_rewind, but it would not be handled by proposed change.
>
>
> I would be curious to follow the reasoning for such a two-phase
> archiving (You basically want to push it in two places, no? But why
> not just use pg_receivexlog then?). This is complicated to handle from
> the point of view of availability and backup reliability + durability.
>
>
> We do compress WALs and send them over network. Doing it via
> archive_command in single thread is sometimes slower than new WALs are
> written under heavy load.
>

How would this work when it comes to rewinding against a file directory?

>
> --
> May the force be with you…
> https://simply.name
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 1:04 PM, Michael Paquier 
wrote:

> On Tue, Sep 5, 2017 at 7:54 PM, Vladimir Borodin  wrote:
> > 5 сент. 2017 г., в 12:31, Chris Travers 
> > написал(а):
> >
> > I think the simplest solution for now is to skip any files ending in
> .conf,
> > .log, and serverlog.
>
> This is not a portable solution. Users can include configuration files
> with the names they want. So the current patch as proposed is
> definitely not something worth it.
>

Actually that is exactly why I think the long-term solution is to figure
out what we need to copy and not copy anything we don't recognise.

That means the following directories as far as I can see:
 * base
 * global
 * pg_xlog/pg_wal
 * pg_clog/pg_xact
 * pg_commit_ts
 * pg_twophase
 * pg_snapshots?

Are there any other directories I am missing?


At any rate, I think the current state makes it very difficult to test
rewind adequately, and it makes it extremely difficult to use in a
non-trivial environment because you have to handle replication slots,
configuration files, and so forth yourself, and you have to be aware that
these *may* or *may not* be consistently clobbered by a rewind, so you have
to have some way of applying another set of files in following a rewind.

If nothing else we ought to *at least* special case the recovery.conf and
the postgresql.auto.conf, and pg_replslot because these are always located
there and should never be clobbered.


>
> > For example, in archive_command we put WALs for archiving from
> > pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> > task makes real archiving. This directory ideally should be skipped by
> > pg_rewind, but it would not be handled by proposed change.
>
> I would be curious to follow the reasoning for such a two-phase
> archiving (You basically want to push it in two places, no? But why
> not just use pg_receivexlog then?). This is complicated to handle from
> the point of view of availability and backup reliability + durability.
>
> > While it is definitely an awful idea the user can easily put something
> > strange (i.e. logs) to any important directory in PGDATA (i.e. into base
> or
> > pg_wal). Or how for example pg_replslot should be handled (I asked about
> it
> > a couple of years ago [1])? It seems that a glob/regexp for things to
> skip
> > is a more universal solution.
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/8DDCCC9D-450D-
> 4CA2-8CF6-40B382F1F699%40simply.name
>
> Well, keeping the code simple is not always a bad thing. Logs are an
> example that can be easily countered, as well as archives in your
> case.
>




> --
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Vladimir Borodin

> 5 сент. 2017 г., в 14:04, Michael Paquier  
> написал(а):
> 
>> For example, in archive_command we put WALs for archiving from
>> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
>> task makes real archiving. This directory ideally should be skipped by
>> pg_rewind, but it would not be handled by proposed change.
> 
> I would be curious to follow the reasoning for such a two-phase
> archiving (You basically want to push it in two places, no? But why
> not just use pg_receivexlog then?). This is complicated to handle from
> the point of view of availability and backup reliability + durability.

We do compress WALs and send them over network. Doing it via archive_command in 
single thread is sometimes slower than new WALs are written under heavy load.

--
May the force be with you…
https://simply.name



Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 12:54 PM, Vladimir Borodin  wrote:

>
> 5 сент. 2017 г., в 12:31, Chris Travers 
> написал(а):
>
> I think the simplest solution for now is to skip any files ending in
> .conf, .log, and serverlog.
>
>
> Why don’t you want to solve the problem once? It is a bit harder to get
> consensus on a way how to do it, but it seems that there are no reasons to
> make temporary solution here.
>
> For example, in archive_command we put WALs for archiving from
> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> task makes real archiving. This directory ideally should be skipped by
> pg_rewind, but it would not be handled by proposed change.
>

Ok let's back up a bit in terms of what I see is the proper long-term fix.
Simple code, by the way, is important, but at least as important are
programs which solve simple, well defined problems.  The current state is:

1.  pg_rewind makes *no guarantee* as to whether differences in logs,
config files, etc. are clobbered.  They may (If a rewind is needed) or not
(If the timelines haven't diverged).  Therefore the behavior of these sorts
of files with the invocation of pg_rewind is not really very well defined.
That's a fairly big issue in an operational environment.

2.  There are files which *may* be copied (I.e. are copied if the timelines
have diverged) which *may* have side effects on replication topology, wal
archiving etc.  Replication slots, etc. are good examples.

The problem I think pg_rewind should solve is "give me a consistent data
environment from the timeline on that server."  I would think that access
to the xlog/clog files would indeed be relevant to that.  If I were
rewriting the application now I would include those.  Just because
something can be handled separately doesn't mean it should be, and I would
refer not to assume that archiving is properly set up and working.

>
>
> Long run, it would be nice to change pg_rewind from an opt-out approach to
> an approach of processing the subdirectories we know are important.
>
>
> While it is definitely an awful idea the user can easily put something
> strange (i.e. logs) to any important directory in PGDATA (i.e. into base or
> pg_wal). Or how for example pg_replslot should be handled (I asked about it
> a couple of years ago [1])? It seems that a glob/regexp for things to skip
> is a more universal solution.
>

I am not convinced it is a universal solution unless you take an arbitrary
number or regexes to check and loop through checking all of them.  Then the
chance of getting something catastrophically wrong in a critical
environment goes way up and you may end up in an inconsistent state at the
end.

Simple code is good.  A program that solves simple problems reliably (and
in simple ways) is better.

The problem I see is that pg_rewind gets incorporated into other tools
which don't really provide the user before or after hooks and therefore it
isn't really fair to say, for example that repmgr has the responsibility to
copy server logs out if present, or to make sure that configuration files
are not in the directory.

The universal solution is to only touch the files that we know are needed
and therefore work simply and reliably in a demanding environment.


>
> [1] https://www.postgresql.org/message-id/flat/8DDCCC9D-
> 450D-4CA2-8CF6-40B382F1F699%40simply.name
>
>
> --
> May the force be with you…
> https://simply.name
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Michael Paquier
On Mon, Sep 4, 2017 at 10:38 PM, Alvaro Herrera  wrote:
> I wonder how portable fnmatch() is in practice (which we don't currently
> use anywhere).  A shell glob seems a more natural interface to me for
> this than a regular expression.

On Windows you could use roughly PathMatchSpecEx, but it does not seem
that all the wildcards of fnmatch are available there.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Michael Paquier
On Tue, Sep 5, 2017 at 7:54 PM, Vladimir Borodin  wrote:
> 5 сент. 2017 г., в 12:31, Chris Travers 
> написал(а):
>
> I think the simplest solution for now is to skip any files ending in .conf,
> .log, and serverlog.

This is not a portable solution. Users can include configuration files
with the names they want. So the current patch as proposed is
definitely not something worth it.

> For example, in archive_command we put WALs for archiving from
> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> task makes real archiving. This directory ideally should be skipped by
> pg_rewind, but it would not be handled by proposed change.

I would be curious to follow the reasoning for such a two-phase
archiving (You basically want to push it in two places, no? But why
not just use pg_receivexlog then?). This is complicated to handle from
the point of view of availability and backup reliability + durability.

> While it is definitely an awful idea the user can easily put something
> strange (i.e. logs) to any important directory in PGDATA (i.e. into base or
> pg_wal). Or how for example pg_replslot should be handled (I asked about it
> a couple of years ago [1])? It seems that a glob/regexp for things to skip
> is a more universal solution.
>
> [1]
> https://www.postgresql.org/message-id/flat/8DDCCC9D-450D-4CA2-8CF6-40B382F1F699%40simply.name

Well, keeping the code simple is not always a bad thing. Logs are an
example that can be easily countered, as well as archives in your
case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Vladimir Borodin

> 5 сент. 2017 г., в 12:31, Chris Travers  написал(а):
> 
> I think the simplest solution for now is to skip any files ending in .conf, 
> .log, and serverlog.

Why don’t you want to solve the problem once? It is a bit harder to get 
consensus on a way how to do it, but it seems that there are no reasons to make 
temporary solution here.

For example, in archive_command we put WALs for archiving from pg_xlog/pg_wal 
into another directory inside PGDATA and than another cron task makes real 
archiving. This directory ideally should be skipped by pg_rewind, but it would 
not be handled by proposed change.

> 
> Long run, it would be nice to change pg_rewind from an opt-out approach to an 
> approach of processing the subdirectories we know are important.

While it is definitely an awful idea the user can easily put something strange 
(i.e. logs) to any important directory in PGDATA (i.e. into base or pg_wal). Or 
how for example pg_replslot should be handled (I asked about it a couple of 
years ago [1])? It seems that a glob/regexp for things to skip is a more 
universal solution.

[1] 
https://www.postgresql.org/message-id/flat/8DDCCC9D-450D-4CA2-8CF6-40B382F1F699%40simply.name


--
May the force be with you…
https://simply.name



Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Mon, Sep 4, 2017 at 3:38 PM, Alvaro Herrera 
wrote:

> Chris Travers wrote:
> > On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >
> > > On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
> > >  wrote:
> > > > A simple idea would be to pass as a parameter a regex on which we
> > > > check files to skip when scanning the directory of the target
> remotely
> > > > or locally. This needs to be used with care though, it would be easy
> > > > to corrupt an instance.
> > >
> > > I actually shortcut that with a strategy similar to base backups: logs
> > > are on another partition, log_directory uses an absolute path, and
> > > PGDATA has no reference to the log path.
> >
> > Yeah, it is quite possible to move all these out of the data directory,
> but
> > bad things can happen when you accidentally copy configuration or logs
> over
> > those on the target and expecting that all environments will be properly
> > set up to avoid these problems is not always a sane assumption.
>
> I agree that operationally it's better if these files weren't in PGDATA
> to start with, but from a customer support perspective, things are
> frequently not already setup like that, so failing to support that
> scenario is a loser.
>
> I wonder how portable fnmatch() is in practice (which we don't currently
> use anywhere).  A shell glob seems a more natural interface to me for
> this than a regular expression.
>

I think the simplest solution for now is to skip any files ending in .conf,
.log, and serverlog.

Long run, it would be nice to change pg_rewind from an opt-out approach to
an approach of processing the subdirectories we know are important.

It is worth noting further that if you rewind in the wrong way, in a
cascading replication environment, you can accidentally change your
replication topology if you clobber the recovery.conf from another replica
and there is no way to ensure that this file is not in the data directory
since it MUST be put there.

Best Wishes,
Chris Travers


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Alvaro Herrera
Chris Travers wrote:
> On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier 
> wrote:
> 
> > On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
> >  wrote:
> > > A simple idea would be to pass as a parameter a regex on which we
> > > check files to skip when scanning the directory of the target remotely
> > > or locally. This needs to be used with care though, it would be easy
> > > to corrupt an instance.
> >
> > I actually shortcut that with a strategy similar to base backups: logs
> > are on another partition, log_directory uses an absolute path, and
> > PGDATA has no reference to the log path.
> 
> Yeah, it is quite possible to move all these out of the data directory, but
> bad things can happen when you accidentally copy configuration or logs over
> those on the target and expecting that all environments will be properly
> set up to avoid these problems is not always a sane assumption.

I agree that operationally it's better if these files weren't in PGDATA
to start with, but from a customer support perspective, things are
frequently not already setup like that, so failing to support that
scenario is a loser.

I wonder how portable fnmatch() is in practice (which we don't currently
use anywhere).  A shell glob seems a more natural interface to me for
this than a regular expression.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier 
wrote:

> On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
>  wrote:
> > A simple idea would be to pass as a parameter a regex on which we
> > check files to skip when scanning the directory of the target remotely
> > or locally. This needs to be used with care though, it would be easy
> > to corrupt an instance.
>
> I actually shortcut that with a strategy similar to base backups: logs
> are on another partition, log_directory uses an absolute path, and
> PGDATA has no reference to the log path.
>

Yeah, it is quite possible to move all these out of the data directory, but
bad things can happen when you accidentally copy configuration or logs over
those on the target and expecting that all environments will be properly
set up to avoid these problems is not always a sane assumption.

So consequently, I think it would be good to fix in the tool.  The
fundamental question is if there is any reason someone would actually want
to copy config files over.


--
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
Ok so I have a proof of concept patch here.

This is proof of concept only.  It odes not change documentation or the
like.

The purpose of the patch is discussion on the "do we want this" side.

The patch is fairly trivial but I have not added test cases or changed docs
yet.

Intention of the patch:
pg_rewind is an important backbone tool for recovering data directories
following a switchover.  However currently it is over inclusive as to what
it copies.  This patch excludes any file ending in "serverlog", ".conf",
and ".log" because these are never directly related and add a great deal of
complexity to switchovers.

.conf files are excluded for two major reasons.  The first is that often we
may want to put postgresql.conf and other files in the data directory, and
if we change these during switchover this can change, for example, the port
the database is running on or other things that can break production or
testing environments.  This is usually a problem with testing environments,
but it could happen with production environments as well.

A much larger concern with .conf files though is the recovery.conf.  This
file MUST be put in the data directory, and it helps determine the
replication topology regarding cascading replication and the like.  If you
rewind from an upstream replica, you suddenly change the replication
topology and that can have wide-ranging impacts.

I think we are much better off insisting that .conf files should be copied
separately because the scenarios where you want to do so are more limited
and the concern often separate from rewinding the timeline itself.

The second major exclusion added are files ending in "serverlog" and
".log."  I can find no good reason why server logs from the source should
*ever* clobber those on the target.  If you do this, you lose historical
information relating to problems and introduce management issues.


Backwards-incompatibility scenarios:
If somehow one has a workflow that depends on copying .conf files, this
would break that.  I cannot think of any cases where anyone would actually
want to do this but that doesn't mean they aren't out there.  If people
really want to, then they need to copy the configuration files they want
separately.

Next Steps:

If people like this idea I will add test cases and edit documentation as
appropriate.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


pg_rewind_log_conf_patch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Michael Paquier
On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
 wrote:
> A simple idea would be to pass as a parameter a regex on which we
> check files to skip when scanning the directory of the target remotely
> or locally. This needs to be used with care though, it would be easy
> to corrupt an instance.

I actually shortcut that with a strategy similar to base backups: logs
are on another partition, log_directory uses an absolute path, and
PGDATA has no reference to the log path.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Michael Paquier
On Mon, Sep 4, 2017 at 5:53 PM, Chris Travers  wrote:
> In some experiments with pg_rewind and rep mgr I noticed that local testing
> is complicated by the fact that pg_rewind appears to copy configuration
> files from the source to target directory.
>
> I would propose to make a modest patch to exclude postgresql.conf,
> pg_hba.conf, and pg_ident.conf from the file tree traversal.
>
> Any feedback before I create.a proof of concept?

A simple idea would be to pass as a parameter a regex on which we
check files to skip when scanning the directory of the target remotely
or locally. This needs to be used with care though, it would be easy
to corrupt an instance.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Sokolov Yura

On 2017-09-04 11:53, Chris Travers wrote:

In some experiments with pg_rewind and rep mgr I noticed that local
testing is complicated by the fact that pg_rewind appears to copy
configuration files from the source to target directory.

I would propose to make a modest patch to exclude postgresql.conf,
pg_hba.conf, and pg_ident.conf from the file tree traversal.

Any feedback before I create.a proof of concept?

--

Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com [1]
Saarbrücker Straße 37a, 10405 Berlin



Links:
--
[1] http://www.adjust.com/


And we had production issue with pg_rewind which copied huge textual
logs from pg_log (20GB each, cause statements were logged for
statistic). It will be convenient to tell pg_rewind not to copy logs
too.

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
In some experiments with pg_rewind and rep mgr I noticed that local testing
is complicated by the fact that pg_rewind appears to copy configuration
files from the source to target directory.

I would propose to make a modest patch to exclude postgresql.conf,
pg_hba.conf, and pg_ident.conf from the file tree traversal.

Any feedback before I create.a proof of concept?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin