I remember Heikki mentioned improving redo recovery in one of the emails in the past, so I know people are already thinking about this. I have some ideas and just wanted to get comments here.
ISTM that its important to keep the redo recovery time as small as possible in order to reduce the downtime in case of unplanned maintenence. One way to do this is to take checkpoints very aggressively to keep the amount of redo work small. But the current checkpoint logic writes all the dirty buffers to disk and hence generates lots of IO. That limits our ability to take very frequent checkpoints. The current redo-recovery is a single threaded, synchronous process. The XLOG is read sequentially, each log record is examined and replayed if required. This requires reading disk blocks in the shared buffers and applying changes to the buffer. The reading happens synchronously and that would usually make the redo process very slow. What I am thinking is if we can read ahead these blocks in the shared buffers and then apply redo changes to them, it can potentially improve things a lot. If there are multiple read requests, kernel (or controller ?) can probably schedule the reads more efficiently. One way to do this is to read ahead the XLOG and make asynchronous read requests for these blocks. But I am not sure if we support asynchronous reads yet. Another (and may be easier) way is to fork another process which can just read-ahead the XLOG and get the blocks in memory while other process does the normal redo recovery. One obvious downside of reading ahead would be that we may need to jump backward and forward in the XLOG file which is otherwise sequentially read. But that can be handled by using XLOG buffers for redo. Btw, isn't our redo recovery completely physical in nature ? I mean, can we replay redo logs related to a block independent of other blocks ? The reason I am asking because if thats the case, ISTM we can introduce parallelism in recovery by splitting and reordering the xlog records and then run multiple processes to do the redo recovery. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq