Since Phase1 is functioning and should hopefully soon complete, we can now start thinking about Phase 2: full recovery to a point-in-time.
Previous thinking was that a command line switch would be used to specify recover to a given point in time, rather than the default, which will be recover all the way to end of (available) xlogs.
Recovering to a specific point in time forces us to consider what the granularity is of time. We could recover: 1.to end of a full transaction log file 2.to end of a full transaction
Transaction log files currently have timestamps, so that is straightforward, but probably not the best we can do. We would rollforward until the xlog file time > desired point in time.
To make (2) work we would have to have a timestamp associated with each transaction. This could be in one of two places: 1. the transaction record in the clog 2. the log record in the xlog We would then recover the xlog record by record, until we found a record that had a timestamp > desired point-in-time.
Currently, neither of these places have a timestamp. Hmmmm. We can't use pg_control because we are assuming that it needs recovery...
I can't see any general way of adding a timestamp in any less than 2 bytes. We don't need a timezone. The timestamp could refer to a number of seconds since last checkpoint; since this is limited already by a GUC to force checkpoints every so often. Although code avoids a checkpoint if no updates have taken place, we wouldn't be too remiss to use a forced checkpoint every 32,000 seconds (9 hours). Assuming that accuracy of the point-in-time was of the order of seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) every 40 minutes or so. All of that seems too restrictive. If we went to milliseconds, then we could use a 4 byte value and use a checkpoint (force) every 284 hours or 1.5 weeks. Thoughts?
Clog uses 2 bits per transaction, so even 2 bytes extra per transaction will make the clog 9 times larger than originally intended. This could well cause it to segment quicker, but I'm sure no one would be happy with that. So, lets not add anything to the clog.
The alternative is to make the last part of the XlogHeader record a timestamp value, increasing each xlog write. It might be possible to make this part of the header optional depending upon whether or not PITR was required, but then my preference is against such dynamic coding.
So, I propose:
- appending 8 byte date/time data into xlog file header record - appending 4 bytes of time offset onto each xlog record - altering the recovery logic to compare the calculated time of each xlog record (file header + offset) against the desired point-in-time, delivered to it by GUC.
Input is sought from anybody with detailed NTP knowledge, since the working of NTP drift correction may have some subtle interplay with this proposal.
Also, while that code is being altered, some additional log records need
to be added when recovery of each new xlog starts, with timing, to allow
DBAs watching a recovery to calculate expected completion times for the
recovery, which is essential for long recovery situations.
I am also considering any changes that may be required to prepare the way for a future implementation of parallel redo recovery.
Best regards, Simon Riggs, 2ndQuadrant http://www.2ndquadrant.com
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
Simon,
I have one question which might be important: If we use timestamps inside the WAL system to find out where to stop. What happens if somebody changes the time of the system? (e.g. correcting the system clock by calling ntpdate). Wouldn't it confuse the PITR system? How do you plan to handle that? Unfortunately time is nothing which can be used as a key (at least not from my point of view).
Just some lousy ideas early in the morning ...
Regards,
Hans
-- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html