On 10/27/2014 05:58 PM, Tomas Vondra wrote:
On 27.10.2014 17:24, Heikki Linnakangas wrote:
On 10/27/2014 03:46 PM, Tom Lane wrote:
Heikki Linnakangas <hlinnakan...@vmware.com> writes:
On 10/27/2014 03:21 PM, Tomas Vondra wrote:
Thinking about this a bit more, do we really need a full checkpoint?
That
is a checkpoint of all the databases in the cluster? Why
checkpointing the
source database is not enough?
A full checkpoint ensures that you always begin recovery *after* the
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
crash recovery (except when you crash before the transaction commits, in
which case it doesn't matter if the new database's directory is borked).
Yeah.  After re-reading the 2005 thread, I wonder if we shouldn't just
bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log
all the copied files instead of doing a "cp -r"-equivalent directory
copy.
That would fix a number of existing replay hazards as well as making it
safe to do what Tomas wants.  In the small scale this would cause more
I/O
(2 copies of the template database's data) but in production situations
we might well come out ahead by avoiding a forced checkpoint of the rest
of the cluster.  Also I guess we could skip WAL-logging if WAL archiving
is off, similarly to the existing optimization for CREATE INDEX etc.
That would be a nasty surprise for anyone who's using CREATE DATABASE
as a fast way to clone a large database. But I would be OK with that,
at least if we can skip the WAL-logging with wal_level=minimal.
That's true. Sadly, I can't think of a solution that would address both
use cases at the same time :-(

The only thing I can think of is having two CREATE DATABASE "flavors".
One keeping the current approach (suitable for fast cloning) and one
with the WAL logging (minimizing the CREATE DATABASE duration the impact
on other backends).

It will probably make the code significantly more complex, which is not
exactly desirable, I guess. Also, if we keep the current code (even if
only as a special case) it won't eliminate the existing replay hazards
(which was one of the Tom's arguments for biting the bullet).

I'm also thinking that for wal_level=archive and large databases, this
won't really eliminate the checkpoint as it will likely generate enough
WAL to hit checkpoint_segments and trigger a checkpoint anyway. No?

That being said, our CREATE DATABASE docs currently say this

     Although it is possible to copy a database other than template1 by
     specifying its name as the template, this is not (yet) intended as
     a general-purpose "COPY DATABASE" facility. The principal
     limitation is that no other sessions can be connected to the
     template database while it is being copied. CREATE DATABASE will
     fail if any other connection exists when it starts; otherwise, new
     connections to the template database are locked out until CREATE
     DATABASE completes. See Section 21.3 for more information.

I think that this limitation pretty much means no one should use CREATE
DATABASE for cloning live databases in production environment (because
of the locking).

It also seems to me the "general-purpose COPY DATABASE" described in the
docs is what we're describing in this thread.



Notwithstanding what the docs say, I have seen CREATE DATABASE used plenty of times, and quite effectively, to clone databases. I don't think making it do twice the IO in the general case is going to go down well.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to