Re: [HACKERS] CREATE DATABASE vs delayed table unlink
The error on createdb happened again this morning. However, this time an abandoned directory was not created. The full error message was: $ createdb -E SQL_ASCII -U flyminebuild -h brian.flymine.org -T production-flyminebuild production-flyminebuild:uniprot createdb: database creation failed: ERROR: could not stat file "base/33049747/33269704": No such file or directory However, my colleagues promptly dropped the database that was being copied and restarted the build process, so I can't diagnose anything. Suffice to say that there is no abandoned directory, and the directory 33049747 no longer exists either. I'll try again to get some details next time it happens. Matthew -- $ rm core Segmentation Fault (core dumped) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Matthew Wakeling <[EMAIL PROTECTED]> writes: > On Thu, 9 Oct 2008, Tom Lane wrote: >> So I'm mystified >> how Matthew could have seen the expected error and yet had the >> destination tree (or at least large chunks of it) left behind. > Remember I was running 8.3.0, and you mentioned a few changes after that > version which would have made sure the destination tree was cleaned up > properly. Well, there were some fixes for the case of a SIGTERM shutdown, but I still don't see how 8.3.0 (or any PG version for some time back) could report the file-not-found-in-source-tree failure without having passed through the cleanup code. There's some possibility that it tried to clean up and got a failure (which would be reported as a WARNING, which conceivably you didn't note) ... but it's kind of hard to see what failure it could get from deleting files it just created. Is there anything weird about the ownership/permissions on the orphaned directories and files? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Heikki Linnakangas <[EMAIL PROTECTED]> writes: Another thought is to ignore ENOENT in copydir. On Wed, 8 Oct 2008, Tom Lane wrote: Yeah, I thought about that too, but it seems extremely dangerous ... I agree. If a file randomly goes missing, that's not an error to ignore, even if you think the only way that could happen is safe. I could be wrong - but couldn't other bad things happen too? If you're copying the files before the checkpoint has completed, couldn't the new database end up with some of the recent changes going missing? Or is that prevented by FlushDatabaseBuffers? Matthew -- Isn't "Microsoft Works" something of a contradiction? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
On Thu, 9 Oct 2008, Tom Lane wrote: So I'm mystified how Matthew could have seen the expected error and yet had the destination tree (or at least large chunks of it) left behind. Remember I was running 8.3.0, and you mentioned a few changes after that version which would have made sure the destination tree was cleaned up properly. [ thinks for a bit... ] We know there were multiple occurrences. Matthew, is it possible that you had other createdb failures that did *not* report "file does not exist"? For instance, a createdb interrupted by a "fast" database shutdown might have left things this way. Well, we didn't have any fast database shutdowns or power failures. I don't think so. Matthew -- Heat is work, and work's a curse. All the heat in the universe, it's going to cool down, because it can't increase, then there'll be no more work, and there'll be perfect peace. -- Michael Flanders -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I committed a patch to do a full-blown checkpoint before the copy. > Annoying to do two checkpoints, but CREATE DATABASE is a pretty > heavy-weight operation anyway. I don't see any other solution at the > moment, at least not one that we could back-patch. Agreed. Patch looks good. I tried to reproduce the issue here using yesterday's CVS HEAD. It is not hard to get the "file does not exist" failure, but so far as I can tell CREATE DATABASE does clean up the target directory before reporting that failure to the user. It is probably possible to interrupt the cleanup, but if that happened then the original error message wouldn't ever get delivered at all. So I'm mystified how Matthew could have seen the expected error and yet had the destination tree (or at least large chunks of it) left behind. [ thinks for a bit... ] We know there were multiple occurrences. Matthew, is it possible that you had other createdb failures that did *not* report "file does not exist"? For instance, a createdb interrupted by a "fast" database shutdown might have left things this way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Matthew Wakeling wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Another thought is to ignore ENOENT in copydir. On Wed, 8 Oct 2008, Tom Lane wrote: Yeah, I thought about that too, but it seems extremely dangerous ... I agree. If a file randomly goes missing, that's not an error to ignore, even if you think the only way that could happen is safe. I committed a patch to do a full-blown checkpoint before the copy. Annoying to do two checkpoints, but CREATE DATABASE is a pretty heavy-weight operation anyway. I don't see any other solution at the moment, at least not one that we could back-patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Matthew Wakeling wrote: I could be wrong - but couldn't other bad things happen too? If you're copying the files before the checkpoint has completed, couldn't the new database end up with some of the recent changes going missing? Or is that prevented by FlushDatabaseBuffers? FlushDatabaseBuffers prevents that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Another thought is to ignore ENOENT in copydir. Yeah, I thought about that too, but it seems extremely dangerous ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE DATABASE vs delayed table unlink
Tom Lane wrote: The thread here http://archives.postgresql.org/pgsql-performance/2008-10/msg00031.php illustrates an undesirable side effect of the recent patch to delay table file unlinks to the next checkpoint. What is evidently happening is that copydir() fetches a block of a directory, and by the time it arrives at some particular entry in the block, a checkpoint has happened and that file got removed. If there are some large files in the directory then the window for this race condition can be wide. The only real solution I can see is to replace createdb()'s FlushDatabaseBuffers call with a full-blown checkpoint. It's pretty annoying to do *two* checkpoints in a CREATE DATABASE, but as long as we're doing this via filesystem-based APIs we probably haven't got much choice. Hmph, that is pretty annoying. An extra checkpoint seems like the easy solution. Another thought is to ignore ENOENT in copydir. But then you'd still copy all the lingering empty files, which would never be deleted. They'd be zero-length, and you can end up with orphaned files anyway in crash scenarios, but it'd still be annoying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CREATE DATABASE vs delayed table unlink
The thread here http://archives.postgresql.org/pgsql-performance/2008-10/msg00031.php illustrates an undesirable side effect of the recent patch to delay table file unlinks to the next checkpoint. What is evidently happening is that copydir() fetches a block of a directory, and by the time it arrives at some particular entry in the block, a checkpoint has happened and that file got removed. If there are some large files in the directory then the window for this race condition can be wide. The only real solution I can see is to replace createdb()'s FlushDatabaseBuffers call with a full-blown checkpoint. It's pretty annoying to do *two* checkpoints in a CREATE DATABASE, but as long as we're doing this via filesystem-based APIs we probably haven't got much choice. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers