Re: [HACKERS] PITR, checkpoint, and local relations
Tom Lane wrote: > "J. R. Nield" <[EMAIL PROTECTED]> writes: > >> Uh, why? Why not just force a checkpoint and remember the exact > >> location of the checkpoint within the current log file? > > > If I do a backup with PITR and save it to tape, I need to be able to > > restore it even if my machine is destroyed in a fire, and all the logs > > since the end of a backup are destroyed. > > And for your next trick, restore it even if the backup tape itself is > destroyed. C'mon, be a little reasonable here. The backups and the > log archive tapes are *both* critical data in any realistic view of > the world. Tom, just because he doesn't agree with you doesn't mean he is unreasonable. I think it is an admirable goal to allow the PITR backup to restore a consistent copy of the database _without_ needing the logs. In fact, I consider something that _needs_ the logs to restore to a consistent state to be broken. If you are doing offsite backup, which people should be doing, requiring the log tape for restore means you have to recycle the log tape _after_ the PITR backup, and to restore to a point in the future, you need two log tapes, one that was done during the backup, and another current. If you can restore the PITR backup without a log tape, you can take just the PITR backup tape off site _and_ you can recyle the log tape _before_ the PITR backup, meaning you only need one tape for a restore to a point in the future. I think there are good reasons to have the PITR backp be restorable on its own, if possible. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: >> Uh, why? Why not just force a checkpoint and remember the exact >> location of the checkpoint within the current log file? > If I do a backup with PITR and save it to tape, I need to be able to > restore it even if my machine is destroyed in a fire, and all the logs > since the end of a backup are destroyed. And for your next trick, restore it even if the backup tape itself is destroyed. C'mon, be a little reasonable here. The backups and the log archive tapes are *both* critical data in any realistic view of the world. > Is the complexity really that big of a problem with this? Yes, it is. Didn't you just admit to struggling with bugs introduced by exactly this complexity?? I don't care *how* spiffy the backup scheme is, if when push comes to shove my backup doesn't restore because there was a software bug in the backup scheme. In this context there simply is not any virtue greater than "simple and reliable". regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
On Wed, 2002-08-07 at 23:41, Tom Lane wrote: > "J. R. Nield" <[EMAIL PROTECTED]> writes: > > The xlog code must allow us to force an advance to the next log file, > > and truncate the archived file when it's copied so as not to waste > > space. > > Uh, why? Why not just force a checkpoint and remember the exact > location of the checkpoint within the current log file? If I do a backup with PITR and save it to tape, I need to be able to restore it even if my machine is destroyed in a fire, and all the logs since the end of a backup are destroyed. If we don't allow the user to force a log advance, how will he do this? I don't want to copy the log file, and then have the original be written to later, because it will become confusing as to which log file to use. Is the complexity really that big of a problem with this? > > When and if you roll back to a prior checkpoint, you'd want to start the > system running forward with a new xlog file, I think (compare what > pg_resetxlog does). But it doesn't follow that you MUST force an xlog > file boundary simply because you're taking a backup. > > > This complicates both the recovery logic and XLogInsert, and I'm trying > > to kill the "last" latent bug in that feature now. > > Indeed. How about keeping it simple, instead? > > regards, tom lane > -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
Sounds like a win all around; make PITR easier and temp tables faster. --- Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > There is debate on whether the local buffers are even valuable > > considering the headache they cause in other parts of the system. > > More specifically, the issue is that when (if) you commit, the contents > of the new table now have to be pushed out to shared storage. This is > moderately annoying in itself (among other things, it implies fsync'ing > those tables before commit). But the real reason it comes up now is > that the proposed PITR scheme can't cope gracefully with tables that > are suddenly there but weren't participating in checkpoints before. > > It looks to me like we should stop using local buffers for ordinary > tables that happen to be in their first transaction of existence. > But, per Vadim's suggestion, we shouldn't abandon the local buffer > manager altogether. What we could and should use it for is TEMP tables, > which have no need to be checkpointed or WAL-logged or fsync'd or > accessible to other backends *ever*. Also, a temp table can leave > blocks in local buffers across transactions, which makes local buffers > considerably more useful than they are now. > > If temp tables didn't use the shared bufmgr nor did updates to them get > WAL-logged, they'd be noticeably more efficient than plain tables, which > IMHO would be a Good Thing. Such tables would be essentially invisible > to WAL and PITR (at least their contents would be --- I assume we'd > still log file creation and deletion). But I can't see anything wrong > with that. > > In short, the proposal runs something like this: > > * Regular tables that happen to be in their first transaction of > existence are not treated differently from any other regular table so > far as buffer management or WAL or PITR go. (rd_myxactonly either goes > away or is used for much less than it is now.) > > * TEMP tables use the local buffer manager for their entire existence. > (This probably means adding an "rd_istemp" flag to relcache entries, but > I can't see anything wrong with that.) > > * Local bufmgr semantics are twiddled to reflect this reality --- in > particular, data in local buffers can be held across transactions, there > is no end-of-transaction write (much less fsync). A TEMP table that > isn't too large might never touch disk at all. > > * Data operations in TEMP tables do not get WAL-logged, nor do we > WAL-log page images of local-buffer pages. > > > These changes seem very attractive to me even without regard for making > the world safer for PITR. I'm willing to volunteer to make them happen, > if there are no objections. > > regards, tom lane > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
Bruce Momjian <[EMAIL PROTECTED]> writes: > There is debate on whether the local buffers are even valuable > considering the headache they cause in other parts of the system. More specifically, the issue is that when (if) you commit, the contents of the new table now have to be pushed out to shared storage. This is moderately annoying in itself (among other things, it implies fsync'ing those tables before commit). But the real reason it comes up now is that the proposed PITR scheme can't cope gracefully with tables that are suddenly there but weren't participating in checkpoints before. It looks to me like we should stop using local buffers for ordinary tables that happen to be in their first transaction of existence. But, per Vadim's suggestion, we shouldn't abandon the local buffer manager altogether. What we could and should use it for is TEMP tables, which have no need to be checkpointed or WAL-logged or fsync'd or accessible to other backends *ever*. Also, a temp table can leave blocks in local buffers across transactions, which makes local buffers considerably more useful than they are now. If temp tables didn't use the shared bufmgr nor did updates to them get WAL-logged, they'd be noticeably more efficient than plain tables, which IMHO would be a Good Thing. Such tables would be essentially invisible to WAL and PITR (at least their contents would be --- I assume we'd still log file creation and deletion). But I can't see anything wrong with that. In short, the proposal runs something like this: * Regular tables that happen to be in their first transaction of existence are not treated differently from any other regular table so far as buffer management or WAL or PITR go. (rd_myxactonly either goes away or is used for much less than it is now.) * TEMP tables use the local buffer manager for their entire existence. (This probably means adding an "rd_istemp" flag to relcache entries, but I can't see anything wrong with that.) * Local bufmgr semantics are twiddled to reflect this reality --- in particular, data in local buffers can be held across transactions, there is no end-of-transaction write (much less fsync). A TEMP table that isn't too large might never touch disk at all. * Data operations in TEMP tables do not get WAL-logged, nor do we WAL-log page images of local-buffer pages. These changes seem very attractive to me even without regard for making the world safer for PITR. I'm willing to volunteer to make them happen, if there are no objections. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
Christopher Kings-Lynne wrote: > > The main area where it seems to get heavy use is during index builds, > > and for 'CREATE TABLE AS SELECT...'. > > > > So I will remove the local buffer manager as part of the PITR patch, > > unless there is further objection. > > Would someone mind filling me in as to what the local bugger manager is and > how it is different (and not useful) compared to the shared buffer manager? Sure. I think I can handle that. When you create a table in a transaction, there isn't any committed state to the table yet, so any table modifications are kept in a local buffer, which is local memory to the backend(?). No one needs to see it because it isn't visible to anyone yet. Same for indexes. Anyway, the WAL activity doesn't handle local buffers the same as shared buffers because there is no crisis if the system crashes. There is debate on whether the local buffers are even valuable considering the headache they cause in other parts of the system. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
> The main area where it seems to get heavy use is during index builds, > and for 'CREATE TABLE AS SELECT...'. > > So I will remove the local buffer manager as part of the PITR patch, > unless there is further objection. Would someone mind filling me in as to what the local bugger manager is and how it is different (and not useful) compared to the shared buffer manager? Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
> > Well, PITR without log archiving could be alternative to > > pg_dump/pg_restore, but I agreed that it's not the big > > feature to worry about. > > Seems like a pointless "feature" to me. A pg_dump dump serves just > as well to capture a snapshot --- in fact better, since it's likely > smaller, definitely more portable, amenable to selective restore, etc. But pg_restore probably will take longer time than copy data files back and re-apply log. > I think we should design the PITR dump to do a good job for PITR, > not a poor job of both PITR and pg_dump. As I already said - agreed -:) Vadim ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
Tom Lane wrote: > "Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > >> It should be sufficient to force a checkpoint when you > >> start and when you're done --- altering normal operation in between is > >> a bad design. > > > But you have to prevent log files reusing while you copy data files. > > No, I don't think so. If you are using PITR then you presumably have > some process responsible for archiving off log files on a continuous > basis. The backup process should leave that normal operational behavior > in place, not muck with it. But what if you normally continuous LOG to tape, and now you want to backup to tape. You can't use the same tape drive for both operations. Is that typical? I know sites that had only one tape drive that did that. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PITR, checkpoint, and local relations
> >> It should be sufficient to force a checkpoint when you > >> start and when you're done --- altering normal operation > in between is > >> a bad design. > > > But you have to prevent log files reusing while you copy data files. > > No, I don't think so. If you are using PITR then you presumably have > some process responsible for archiving off log files on a continuous > basis. The backup process should leave that normal > operational behavior in place, not muck with it. Well, PITR without log archiving could be alternative to pg_dump/pg_restore, but I agreed that it's not the big feature to worry about. Vadim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> It should be sufficient to force a checkpoint when you >> start and when you're done --- altering normal operation in between is >> a bad design. > But you have to prevent log files reusing while you copy data files. No, I don't think so. If you are using PITR then you presumably have some process responsible for archiving off log files on a continuous basis. The backup process should leave that normal operational behavior in place, not muck with it. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR, checkpoint, and local relations
> I really dislike the notion of turning off checkpointing. What if the > backup process dies or gets stuck (eg, it's waiting for some > operator to > change a tape, but the operator has gone to lunch)? IMHO, backup > systems that depend on breaking the system's normal > operational behavior > are broken. It should be sufficient to force a checkpoint when you > start and when you're done --- altering normal operation in between is > a bad design. But you have to prevent log files reusing while you copy data files. That's why I asked are 3 commands from pg_copy required and couldn't be backup accomplished by issuing single command ALTER SYSTEM BACKUP (even from pgsql) so backup process would die with entire system -:) As for tape changing, maybe we could use some timeout and then just stop backup process. Vadim ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PITR, checkpoint, and local relations
> So I think what will work then is pg_copy (hot backup) would: > 1) Issue an ALTER SYSTEM BEGIN BACKUP command which turns on > atomic write, > checkpoints the database and disables further checkpoints (so > wal files > won't be reused) until the backup is complete. > 2) Change ALTER SYSTEM BACKUP DATABASE TO read > the database > directory to find which files it should backup rather than > pg_class and for > each file just use system(cp...) to copy it to the backup directory. Did you consider saving backup on the client host (ie from where pg_copy started)? > 3) ALTER SYSTEM FINISH BACKUP does at it does now and backs > up the pg_xlog > directory and renables database checkpointing. Well, wouldn't be single command ALTER SYSTEM BACKUP enough? What's the point to have 3 commands? (If all of this is already discussed then sorry - I'm not going to start new discussion). Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
Richard Tucker <[EMAIL PROTECTED]> writes: > 1) Issue an ALTER SYSTEM BEGIN BACKUP command which turns on atomic write, > checkpoints the database and disables further checkpoints (so wal files > won't be reused) until the backup is complete. > 2) Change ALTER SYSTEM BACKUP DATABASE TO read the database > directory to find which files it should backup rather than pg_class and for > each file just use system(cp...) to copy it to the backup directory. > 3) ALTER SYSTEM FINISH BACKUP does at it does now and backs up the pg_xlog > directory and renables database checkpointing. > Does this sound right? I really dislike the notion of turning off checkpointing. What if the backup process dies or gets stuck (eg, it's waiting for some operator to change a tape, but the operator has gone to lunch)? IMHO, backup systems that depend on breaking the system's normal operational behavior are broken. It should be sufficient to force a checkpoint when you start and when you're done --- altering normal operation in between is a bad design. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PITR, checkpoint, and local relations
> Are you sure this is true for all ports? Well, maybe you're right and it's not. But with "after-image blocks in log after checkpoint" you really shouldn't worry about block atomicity, right? And ability to turn blocks logging on/off, as suggested by Richard, looks as appropriate for everyone, ? > And if so, why would it be cheaper for the kernel to do it in > its buffer manager, compared to us doing it in ours? This just > seems bogus to rely on. Does anyone know what POSIX has to say > about this? Does "doing it in ours" mean reading all data files through our shared buffer pool? Sorry, I just don't see point in this when tar ect will work just fine. At least for the first release tar is SuperOK, because of there must be and will be other problems/bugs, unrelated to how to read data files, and so the sooner we start testing the better. Vadim ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
Are you sure this is true for all ports? And if so, why would it be cheaper for the kernel to do it in its buffer manager, compared to us doing it in ours? This just seems bogus to rely on. Does anyone know what POSIX has to say about this? On Fri, 2002-08-02 at 18:01, Mikheev, Vadim wrote: > > > How do you get atomic block copies otherwise? > > > > Eh? The kernel does that for you, as long as you're reading the > > same-size blocks that the backends are writing, no? > > Good point. > > Vadim > -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
> > > As long as whole block is saved in log on first after > > > checkpoint (you made before backup) change to block. > > > I thought half the point of PITR was to be able to > turn off pre-image logging so you can trade potential > recovery time for speed without fear of data-loss. > Didn't we have this discussion before? > Suppose you can turn off/on PostgreSQL's atomic write on > the fly. Which means turning on or off whether XLoginsert > writes a copy of the block into the log file upon first > modification after a checkpoint. > So ALTER SYSTEM BEGIN BACKUP would turn on atomic write > and then checkpoint the database. > So while the OS copy of the data files is going on the > atomic write would be enabled. So any read of a partial > write would be fixed up by the usual crash recovery mechanism. Yes, simple way to satisfy everyone. Vadim ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR, checkpoint, and local relations
> > How do you get atomic block copies otherwise? > > Eh? The kernel does that for you, as long as you're reading the > same-size blocks that the backends are writing, no? Good point. Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
> > You don't need it. > > As long as whole block is saved in log on first after > > checkpoint (you made before backup) change to block. > > I thought half the point of PITR was to be able to turn > off pre-image logging so you can trade potential recovery Correction - *after*-image. > time for speed without fear of data-loss. Didn't we have > this discussion before? Sorry, I missed this. So, it's already discussed what to do about partial block updates? When system crashed just after LSN, but not actual tuple etc, was stored in on-disk block and on restart you compare log record' LSN with data block' LSN, they are equal and so you *assume* that actual data are in place too, what is not the case? I always thought that the whole point of PITR is to be able to restore DB fast (faster than pg_restore) *AND* up to the last committed transaction (assuming that log is Ok). Vadim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
> > So, we only have to use shared buffer pool for local (but probably > > not for temporary) relations to close this issue, yes? I personally > > don't see any performance issues if we do this. > > Hmm. Temporary relations are a whole different story. > > It would be nice if updates on temp relations never got WAL-logged at > all, but I'm not sure how feasible that is. Right now we don't really There is no any point to log them. > distinguish temp relations from ordinary ones --- in particular, they > have pg_class entries, which surely will get WAL-logged even if we > persuade the buffer manager not to do it for the data pages. Is that > a problem? Not sure. It was not about any problem. I just mean that local buffer pool still could be used for temporary relations if someone thinks that it has any sence, anyone? Vadim ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > So, we only have to use shared buffer pool for local (but probably > not for temporary) relations to close this issue, yes? I personally > don't see any performance issues if we do this. Hmm. Temporary relations are a whole different story. It would be nice if updates on temp relations never got WAL-logged at all, but I'm not sure how feasible that is. Right now we don't really distinguish temp relations from ordinary ones --- in particular, they have pg_class entries, which surely will get WAL-logged even if we persuade the buffer manager not to do it for the data pages. Is that a problem? Not sure. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: >> (In particular, I *strongly* object to using the buffer manager at all >> for reading files for backup. That's pretty much guaranteed to blow out >> buffer cache. Use plain OS-level file reads. An OS directory search >> will do fine for finding what you need to read, too.) > How do you get atomic block copies otherwise? Eh? The kernel does that for you, as long as you're reading the same-size blocks that the backends are writing, no? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
On Fri, 2002-08-02 at 16:59, Mikheev, Vadim wrote: > You don't need it. > As long as whole block is saved in log on first after > checkpoint (you made before backup) change to block. I thought half the point of PITR was to be able to turn off pre-image logging so you can trade potential recovery time for speed without fear of data-loss. Didn't we have this discussion before? How is this any worse than a table scan? -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
> > (In particular, I *strongly* object to using the buffer > manager at all > > for reading files for backup. That's pretty much > guaranteed to blow out > > buffer cache. Use plain OS-level file reads. An OS > directory search > > will do fine for finding what you need to read, too.) > > How do you get atomic block copies otherwise? You don't need it. As long as whole block is saved in log on first after checkpoint (you made before backup) change to block. Vadim ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
> > The predicate for files we MUST (fuzzy) copy is: > > File exists at start of backup && File exists at end of backup > > Right, which seems to me to negate all these claims about needing a > (horribly messy) way to read uncommitted system catalog entries, do > blind reads, etc. What's wrong with just exec'ing tar after having > done a checkpoint? Right. It looks like insert/update/etc ops over local relations are WAL-logged, and it's Ok (we have to do this). So, we only have to use shared buffer pool for local (but probably not for temporary) relations to close this issue, yes? I personally don't see any performance issues if we do this. Vadim ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR, checkpoint, and local relations
On Fri, 2002-08-02 at 16:01, Tom Lane wrote: > "J. R. Nield" <[EMAIL PROTECTED]> writes: > > The predicate for files we MUST (fuzzy) copy is: > > File exists at start of backup && File exists at end of backup > > Right, which seems to me to negate all these claims about needing a > (horribly messy) way to read uncommitted system catalog entries, do > blind reads, etc. What's wrong with just exec'ing tar after having > done a checkpoint? > There is no need to read uncommitted system catalog entries. Just take a snapshot of the directory to get the OID's. You don't care whether the get deleted before you get to them, because the log will take care of that. > (In particular, I *strongly* object to using the buffer manager at all > for reading files for backup. That's pretty much guaranteed to blow out > buffer cache. Use plain OS-level file reads. An OS directory search > will do fine for finding what you need to read, too.) How do you get atomic block copies otherwise? > > regards, tom lane > -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: > The predicate for files we MUST (fuzzy) copy is: > File exists at start of backup && File exists at end of backup Right, which seems to me to negate all these claims about needing a (horribly messy) way to read uncommitted system catalog entries, do blind reads, etc. What's wrong with just exec'ing tar after having done a checkpoint? (In particular, I *strongly* object to using the buffer manager at all for reading files for backup. That's pretty much guaranteed to blow out buffer cache. Use plain OS-level file reads. An OS directory search will do fine for finding what you need to read, too.) regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
On Fri, 2002-08-02 at 13:50, Richard Tucker wrote: > pg_copy does not handle "local relations" as you would suspect. To find the > tables and indexes to backup the backend in processing the "ALTER SYSTEM > BACKUP" statement reads the pg_class table. Any tables in the process of > coming into existence of course are not visible. If somehow they were then > the backup would backup up their contents. Any in private memory changes > would be captured during crash recovery on the copy of the database. So the > question is: is it possible to read the names of the "local relations" from > the pg_class table even though there creation has not yet been committed? > -regards > richt > No, not really. At least not a consistent view. The way to do this is using the filesystem to discover the relfilnodes, and there are a couple of ways to deal with the problem of files being pulled out from under you, but you have to be careful about what the buffer manager does when a file gets dropped. The predicate for files we MUST (fuzzy) copy is: File exists at start of backup && File exists at end of backup Any other file, while it may be copied, doesn't need to be in the backup because either it will be created and rebuilt during play-forward recovery, or it will be deleted during play-forward recovery, or both, assuming those operations are logged. They really must be logged to do what we want to do. Also, you can't use the normal relation_open stuff, because local relations will not have a catalog entry, and it looks like there are catcache/sinval issues that I haven't completely covered. So you've got to do 'blind reads' through the buffer manager, which involves a minor extension to the buffer manager to support this if local relations go through the shared buffers, or coordinating with the local buffer manager if they continue to work as they do now, which involves major changes. We also have to checkpoint at the start, and flush the log at the end. -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: > What would happen if a transaction with a local relation commits during > backup, and there are log entries inserting the catalog tuples into > pg_class. Should I not apply those on restore? How do I know? This is certainly a non-problem. You see a WAL log entry, you apply it. Whether the transaction actually commits later is not your concern (at least not at that point). > This problem is subtle, and I'm maybe having difficulty explaining it > properly. Do you understand the issue I'm raising? Have I made some kind > of blunder, so that this is really not a problem? After thinking more, I think you are right, but you didn't explain it well. The problem is not really relevant to PITR at all, but is a hole in the initial design of WAL. Consider transaction starts transaction creates local rel transaction writes in local rel... CHECKPOINT transaction writes in local rel... CHECKPOINT transaction writes in local rel... transaction flushes local rel pages to disk transaction commits system crash We'll try to replay the log from the latest checkpoint. This works only if all the local-rel page flushes actually made it to disk, otherwise the updates of the local rel that happened before the last checkpoint may be lost. (I think there is still an fsync in local-rel commit to ensure the flushes happen, but it's sure messy to do it that way.) We could possibly fix this by logging the local-rel-flush page writes themselves in the WAL log, but that'd probably more than ruin the efficiency advantage of the local bufmgr. So I'm back to the idea that removing it is the way to go. Certainly that would provide nontrivial simplifications in a number of places (no tests on local vs global buffer anymore, no special cases for local rel commit, etc). Might be useful to temporarily dike it out and see what the penalty for building a large index is. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
On Fri, 2002-08-02 at 10:01, Tom Lane wrote: > > Just out of curiosity, though, what does it matter? On re-reading your > message I think you are dealing with a non problem, or at least the > wrong problem. Local relations do not need to be checkpointed, because > by definition they were created by a transaction that hasn't committed > yet. They must be, and are, checkpointed to disk before the transaction > commits; but up till that time, if you have a crash then the entire > relation should just go away. What happens when we have a local file that is created before the backup, and it becomes global during the backup? In order to copy this file, I either need: 1) A copy of all its blocks at the time backup started (or later), plus all log records between then and the end of the backup. OR 2) All the log records from the time the local file was created until the end of the backup. In the case of an idle uncommitted transaction that suddenly commits during backup, case 2 might be very far back in the log file. In fact, the log file might be archived to tape by then. So I must do case 1, and checkpoint the local relations. This brings up the question: why do I need to bother backing up files that were local before the backup started, but became global during the backup. We already know that for the backup to be consistent after we restore it, we must play the logs forward to the completion of the backup to repair our "fuzzy copies" of the database files. Since the transaction that makes the local-file into a global one has committed during our backup, its log entries will be played forward as well. What would happen if a transaction with a local relation commits during backup, and there are log entries inserting the catalog tuples into pg_class. Should I not apply those on restore? How do I know? > > That mechanism is there already --- perhaps it needs a few tweaks for > PITR but I do not see any need for cross-backend flush commands for > local relations. > This problem is subtle, and I'm maybe having difficulty explaining it properly. Do you understand the issue I'm raising? Have I made some kind of blunder, so that this is really not a problem? -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: > Ok. This is what I wanted to hear, but I had assumed someone decided to > put it in for a reason, and I wasn't going to submit a patch to pull-out > the local buffer manager without clearing it first. > The main area where it seems to get heavy use is during index builds, Yeah. I do not think it really saves any I/O: unless you abort your index build, the data is eventually going to end up on disk anyway. What it saves is contention for shared buffers (the overhead of acquiring BufMgrLock, for example). Just out of curiosity, though, what does it matter? On re-reading your message I think you are dealing with a non problem, or at least the wrong problem. Local relations do not need to be checkpointed, because by definition they were created by a transaction that hasn't committed yet. They must be, and are, checkpointed to disk before the transaction commits; but up till that time, if you have a crash then the entire relation should just go away. That mechanism is there already --- perhaps it needs a few tweaks for PITR but I do not see any need for cross-backend flush commands for local relations. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR, checkpoint, and local relations
Ok. This is what I wanted to hear, but I had assumed someone decided to put it in for a reason, and I wasn't going to submit a patch to pull-out the local buffer manager without clearing it first. The main area where it seems to get heavy use is during index builds, and for 'CREATE TABLE AS SELECT...'. So I will remove the local buffer manager as part of the PITR patch, unless there is further objection. On Fri, 2002-08-02 at 00:49, Tom Lane wrote: > "J. R. Nield" <[EMAIL PROTECTED]> writes: > > I am working on a way to do this with a signal, using holdoffs around > > calls into the storage-manager and VFS layers to prevent re-entrant > > calls. The local buffer manager is simple enough that it should be > > possible to flush them from within a signal handler at most times, but > > the VFS and storage manager are not safe to re-enter from a handler. > > > Does this sound like a good idea? > > No. What happened to "simple"? > > Before I'd accept anything like that, I'd rip out the local buffer > manager and just do everything in the shared manager. I've never > seen any proof that the local manager buys any noticeable performance > gain anyway ... how many people really do anything much with a table > during its first transaction of existence? > > regards, tom lane > -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
"J. R. Nield" <[EMAIL PROTECTED]> writes: > I am working on a way to do this with a signal, using holdoffs around > calls into the storage-manager and VFS layers to prevent re-entrant > calls. The local buffer manager is simple enough that it should be > possible to flush them from within a signal handler at most times, but > the VFS and storage manager are not safe to re-enter from a handler. > Does this sound like a good idea? No. What happened to "simple"? Before I'd accept anything like that, I'd rip out the local buffer manager and just do everything in the shared manager. I've never seen any proof that the local manager buys any noticeable performance gain anyway ... how many people really do anything much with a table during its first transaction of existence? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR, checkpoint, and local relations
On Thu, 2002-08-01 at 17:14, Bruce Momjian wrote: > > J.R needs comments on this. PITR has problems because local relations > aren't logged to WAL. Suggestions? > I'm sorry if it wasn't clear. The issue is not that local relations aren't logged to WAL, they are. The issue is that you can't checkpoint them. That means if you need a lower bound on the LSN to recover from, then you either need to wait for transactions using them all to commit and flush their local buffers, or there needs to be a async way to tell them all to flush. I am working on a way to do this with a signal, using holdoffs around calls into the storage-manager and VFS layers to prevent re-entrant calls. The local buffer manager is simple enough that it should be possible to flush them from within a signal handler at most times, but the VFS and storage manager are not safe to re-enter from a handler. Does this sound like a good idea? -- J. R. Nield [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] PITR, checkpoint, and local relations
J.R needs comments on this. PITR has problems because local relations aren't logged to WAL. Suggestions? --- J. R. Nield wrote: > As per earlier discussion, I'm working on the hot backup issues as part > of the PITR support. While I was looking at the buffer manager and the > relcache/MyDb issues to figure out the best way to work this, it > occurred to me that PITR will introduce a big problem with the way we > handle local relations. > > The basic problem is that local relations (rd_myxactonly == true) are > not part of a checkpoint, so there is no way to get a lower bound on the > starting LSN needed to recover a local relation. In the past this did > not matter, because either the local file would be (effectively) > discarded during recovery because it had not yet become visible, or the > file would be flushed before the transaction creating it made it > visible. Now this is a problem. > > So I need a decision from the core team on what to do about the local > buffer manager. My preference would be to forget about the local buffer > manager entirely, or if not that then to allow it only for _true_ > temporary data. The only alternative I can devise is to create some way > for all other backends to participate in a checkpoint, perhaps using a > signal. I'm not sure this can be done safely. > > Anyway, I'm glad the tuplesort stuff doesn't try to use relation files > :-) > > Can the core team let me know if this is acceptable, and whether I > should move ahead with changes to the buffer manager (and some other > stuff) needed to avoid special treatment of rd_myxactonly relations? > > Also to Richard: have you guys at multera dealt with this issue already? > Is there some way around this that I'm missing? > > > Regards, > > John Nield > > > > > Just as an example of this problem, imagine the following sequence: > > 1) Transaction TX1 creates a local relation LR1 which will eventually > become a globally visible table. Tuples are inserted into the local > relation, and logged to the WAL file. Some tuples remain in the local > buffer cache and are not yet written out, although they are logged. TX1 > is still in progress. > > 2) Backup starts, and checkpoint is called to get a minimum starting LSN > (MINLSN) for the backed-up files. Only the global buffers are flushed. > > 3) Backup process copies LR1 into the backup directory. (postulate some > way of coordinating with the local buffer manager, a problem I have not > solved). > > 4) TX1 commits and flushes its local buffers. A dirty buffer exists > whose LSN is before MINLSN. LR1 becomes globally visible. > > 5) Backup finishes copying all the files, including the local relations, > and then flushes the log. The log files between MINLSN and the current > LSN are copied to the backup directory, and backup is done. > > 6) Sometime later, a system administrator restores the backup and plays > the logs forward starting at MINLSN. LR1 will be corrupt, because some > of the log entries required for its restoration will be before MINLSN. > This corruption will not be detected until something goes wrong. > > BTW: The problem doesn't only happen with backup! It occurs at every > checkpoint as well, I just missed it until I started working on the hot > backup issue. > > -- > J. R. Nield > [EMAIL PROTECTED] > > > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])