Re: Delete a directory, crash the system
On 28/07/2013 06:54, Polytropon wrote: And here, kids, you can see the strength of open source operating system: You can see _why_ something happens. :-) Too true! On Sat, 27 Jul 2013 20:35:09 +0100, Frank Leonhardt wrote: On 27/07/2013 19:57, David Noel wrote: So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! Who'd have thought. Do I just post this to the mailing list and hope some benevolent developer stumbles upon it and takes it upon him/herself to fix this, or where do I find the FreeBSD Suggestion Box? I guess I should file a Problem Report and see what happens from there. I was going to raise an issue when the discussion had died down to a concensus. I also don't think it's reasonable for the kernel to bomb when it encounters corruption on a disk. If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around line 2791 change: if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); To if (dp-i_effnlink 3) { error = EINVAL; goto out; } The ufs_link() call has a similar issue. I can't see why my mod will break anything, but there's always unintended consequences. One of the core policies usually is to stop _any_ action that had failed due to a reason that cannot be and make sure it won't get worse. This can be seen for example in fsck's behaviour: If there is a massive file system error that cannot be repaired without further intervention that _could_ destroy data or make its retrieval harder or impossible, the operator will be requested to make the decision. There are options to automate this process, but on the other hand, always assume 'yes' can then be a risk, as it could prevent recovery. My assumtion is that the developers chose a similar approach here: We found a situation that should not be possible, so we stop the system for messing up the file system even more. This carries the attitude of not hiding a problem for the sake of convenience by being silent and going back to the usual work. Of course it is debatable if this is the right decision in _this_ particular case. The problem I have with this is the assumption that the inode was at fault. I said this was the most likely, but it's not the absolute reason. At the risk of repeating, it's the /effective/ link count (in the vnode) that's out of line here, not the inode count. If the inode was wrong it could be down to minor FS corruption; an interrupted directory creation or deletion would do the trick. The vnode could go wrong for all sorts of reasons, probably associated with a race during the directory removal, which is not an atomic operation by any means. See The Design of the UNIX operating system p 5.16.1, Bach, Prentice-Hall, 1986. My guess is that we're looking at an old debugging pragma here, put in to cope with a race going wrong if the code wasn't quite right (note that the function has since been renamed but the message not updated). You're right about stopping on internal errors (corruption to the kernel data structures in this case) but this case is indeed debatable. On the one hand, now the system is stable (i.e. we can probably trust rmdir code after all this time), the most likely cause is inode corruption polluting the vnode. On the other hand the pragma may be useful if people are tinkering with the kernel and you get even more opportunities for a race with (say) SMP. I don't expect the kernel to panic on a user-land I/O error, or anything else that's expected or recoverable - and a wonky FS meets these criteria in my book. David was lucky to find this - I tend to run FreeBSD on servers, not laptops, and I'd never have seen this server panic live and therefore not been able to discover the cause very easily. That's worrying. So it boils down to: a) Leave is is, as it can detect when the kernel has trashed its vnode table; or b) It's probably caused by expected FS corruption, so handle it gracefully. Incidentally, if you look at the code you'll see this is only heuristic check, and a weak one at that. Most of the time it WILL NOT pick up the case where the parent directory's link is missing. As far as I can tell it will go on to unlink the target successfully, with no ill effects. If this situation really did lead to catastrophe (as suggested by the use of a panic) then the check used ought to be a lot more reliable! As it is, removing it entirely except for debug kernels, is a third option. Regards, Frank.
Re: Delete a directory, crash the system
Ok folks, thanks again for all the help. Using the feedback I submitted a PR (#180894) -- http://www.freebsd.org/cgi/query-pr.cgi?pr=180894. I also submitted a follow-up to it with Frank's code and notes. What next? I don't really know what happens from here, but I'm guessing/hoping that someone's monitoring the PR system and will move this forward. Crossing my fingers, though if anyone knows any better methods of getting PR's addressed I'm all ears. -David ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 28/07/2013 06:38, David Noel wrote: Ok folks, thanks again for all the help. Using the feedback I submitted a PR (#180894) -- http://www.freebsd.org/cgi/query-pr.cgi?pr=180894. I also submitted a follow-up to it with Frank's code and notes. What next? I don't really know what happens from here, but I'm guessing/hoping that someone's monitoring the PR system and will move this forward. Crossing my fingers, though if anyone knows any better methods of getting PR's addressed I'm all ears. You've already done the right things: raising a PR and posing about your problem on freebsd...@freebsd.org, where it is going to come to the attention of developers working on that area of the system. You're next move should be to provide whatever additional information the developers might need to diagnose or reproduce the problem. This is really the crucial bit: unless a dev can understand what happened and how your system came to break in that particular way, it's unlikely they'll be able to fix it. If you don't understand what's being asked for, or how to roduce any required information, don't be shy about asking -- either here, or over on freebsd-fs@... It's sometimes hard to remember that the sort of debugging things you'ld do routinely and without a second thought as a developer can appear as pretty arcane mysteries to the uninitiated. You may find these bits of documentation useful: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/debugging.html http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html (especially section 10.1 about obtaining a kernel core dump, and 10.2 about using kgdb.) Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. PGP: http://www.infracaninophile.co.uk/pgpkey JID: matt...@infracaninophile.co.uk signature.asc Description: OpenPGP digital signature
Re: Delete a directory, crash the system
On Sun, 28 Jul 2013, Frank Leonhardt wrote: So it boils down to: a) Leave is is, as it can detect when the kernel has trashed its vnode table; or b) It's probably caused by expected FS corruption, so handle it gracefully. It would be good to log a system error message like filesystem may be corrupt to give the user some clue other than a seemingly impossible error with no explanation. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 07/27/13 21:12, cpghost wrote: A more robust file system would halt all processes, and perform an in-kernel fsck on the filesystem and its internal (in-memory) structures to repair the damage... and THEN resume the processes. However, this is a major project, and we don't have a self-healing filesystem / kernel (... yet). ;-) -cpghost. If we think this further, we may as well start introducing some elements of self-healing or at least self-inspecting in the kernel. How about, for example, a kernel thread that wakes up periodically, walks through VFS structures, and checks their integrity? Perhaps also verifying the underlying inodes as well? Think background fsck, but within the kernel and for kernel structures themselves. Others parts of the kernel could as well self-inspect for consistency with a periodic kernel thread. Some parts are easier than others, so I don't think we could also walk the VM structures (if those are corrupt, even the repair-thread will be running amok). But save for that, most parts of the kernel could use some periodic consistency checking. Make that checking optional via a sysctl(8), and it won't even cost performance. -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Delete a directory, crash the system
I had a strange experience on my laptop yesterday. I was deleting a directory and the system crashed. It spat out a message along the lines of ufs_dirrem bad link count 2 on parent. I thought it was so strange I repeated the process several times, and each time it crashed. Is this behavior EXPECTED? I can't for the life of me think of a time or operating system I've run where I've ever had a system crash on me from doing something as basic as deleting a file. Anyway I couldn't boot into single user for some reason so I booted from a USB image, ran fsck, and then everything was fine. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
El 27/07/2013 13:49, David Noel david.i.n...@gmail.com escribió: I had a strange experience on my laptop yesterday. I was deleting a directory and the system crashed. It spat out a message along the lines of ufs_dirrem bad link count 2 on parent. I thought it was so strange I repeated the process several times, and each time it crashed. Is this behavior EXPECTED? I can't for the life of me think of a time or operating system I've run where I've ever had a system crash on me from doing something as basic as deleting a file. Anyway I couldn't boot into single user for some reason so I booted from a USB image, ran fsck, and then everything was fine. Was it a kernel crash? Did you get a core? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
Yes On 7/27/13, Fernando Apesteguía fernando.apesteg...@gmail.com wrote: El 27/07/2013 13:49, David Noel david.i.n...@gmail.com escribió: I had a strange experience on my laptop yesterday. I was deleting a directory and the system crashed. It spat out a message along the lines of ufs_dirrem bad link count 2 on parent. I thought it was so strange I repeated the process several times, and each time it crashed. Is this behavior EXPECTED? I can't for the life of me think of a time or operating system I've run where I've ever had a system crash on me from doing something as basic as deleting a file. Anyway I couldn't boot into single user for some reason so I booted from a USB image, ran fsck, and then everything was fine. Was it a kernel crash? Did you get a core? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
El 27/07/2013 14:16, David Noel david.i.n...@gmail.com escribió: Yes Post the stack trace of the core and maybe someone can help you. On 7/27/13, Fernando Apesteguía fernando.apesteg...@gmail.com wrote: El 27/07/2013 13:49, David Noel david.i.n...@gmail.com escribió: I had a strange experience on my laptop yesterday. I was deleting a directory and the system crashed. It spat out a message along the lines of ufs_dirrem bad link count 2 on parent. I thought it was so strange I repeated the process several times, and each time it crashed. Is this behavior EXPECTED? I can't for the life of me think of a time or operating system I've run where I've ever had a system crash on me from doing something as basic as deleting a file. Anyway I couldn't boot into single user for some reason so I booted from a USB image, ran fsck, and then everything was fine. Was it a kernel crash? Did you get a core? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
Post the stack trace of the core and maybe someone can help you. panic: ufs_dirrem: Bad link count 2 on parent cpuid = 0 KDB: stack backtrace: #0 0x808680fe at kdb_backtrace+0x5e #1 0x80832cb7 at panic+0x187 #2 0x80a700e3 at ufs_rmdir+0x1c3 #3 0x80b7d484 at VOP_RMDIR_APV+0x34 #4 0x808ca32a at kern_rmdirat+0x21a #5 0x80b17cf0 at amd64_syscall+0x450 #6 0x80b03427 at Xfast_syscall+0xf7 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 27/07/2013 13:58, David Noel wrote: Post the stack trace of the core and maybe someone can help you. panic: ufs_dirrem: Bad link count 2 on parent cpuid = 0 KDB: stack backtrace: #0 0x808680fe at kdb_backtrace+0x5e #1 0x80832cb7 at panic+0x187 #2 0x80a700e3 at ufs_rmdir+0x1c3 #3 0x80b7d484 at VOP_RMDIR_APV+0x34 #4 0x808ca32a at kern_rmdirat+0x21a #5 0x80b17cf0 at amd64_syscall+0x450 #6 0x80b03427 at Xfast_syscall+0xf7 I'm taking a guess here - the effective link count when it came to removing the parent directory was only two and it should have been three or more. This gets sanity checked this before proceeding, and panics if it is not. Why an effective link count of three? We're talking about the parent of the directory you're trying to zap, right? There's the link to the directory from its parent, and the '.' link and the .. link from the directory you're trying to remove. There may be more if it contains other directories, but there can't be less. Anyway - if you only had a link count of just two effective links at the start of the delete process it suggests that the link count was messed up - either a link never existed or its count was wrong. Should the kernel panic? Well it's a situation that can never happen - it could simply remove the directory and pretend everything was okay but guess it was decided it was likely to be a symptom of impending disaster. Other anomalies return an error. In over ten years with FreeBSD systems I can't say I've ever seen this cannot happen situation arise. I'd guess you had an interrupted (by power failure) inode operation at some time which caused the corruption. removing a directory is a PITA as it can lead to a race - a context swap could create a file it it mid-way through the process. Regards, Frank. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
I'm taking a guess here - the effective link count when it came to removing the parent directory was only two and it should have been three or more. This gets sanity checked this before proceeding, and panics if it is not. Why an effective link count of three? We're talking about the parent of the directory you're trying to zap, right? There's the link to the directory from its parent, and the '.' link and the .. link from the directory you're trying to remove. There may be more if it contains other directories, but there can't be less. Anyway - if you only had a link count of just two effective links at the start of the delete process it suggests that the link count was messed up - either a link never existed or its count was wrong. Should the kernel panic? Well it's a situation that can never happen - it could simply remove the directory and pretend everything was okay but guess it was decided it was likely to be a symptom of impending disaster. Other anomalies return an error. In over ten years with FreeBSD systems I can't say I've ever seen this cannot happen situation arise. I'd guess you had an interrupted (by power failure) inode operation at some time which caused the corruption. removing a directory is a PITA as it can lead to a race - a context swap could create a file it it mid-way through the process. Regards, Frank. Interesting. Thanks for the analysis. I'm not a systems guy (Java, mostly), so I don't really have the context to make much sense of kgdb output. What you're saying though makes sense and sounds about right -- it's a laptop and I've inadvertently run the battery down to nothing a few times in the past. All the same, it was a very strange experience. I would not have expected a kernel panic from a simple rm -rf! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 07/27/2013 11:30, David Noel wrote: -- it's a laptop and I've inadvertently run the battery down to nothing a few times in the past. All the same, it was a very strange experience. I would not have expected a kernel panic from a simple rm -rf! You may want to look into running fsck(8) and its myriad of options to try to clean up the problem (assuming you're using a ufs filesystem also see fsck_ufs(8)). fsck normally runs during startup but perhaps a set of non-default options will do the trick. Also make sure you have soft updates enabled on your filesystem and preferably journaled soft updates, if for some odd reason you don't, as that is designed to avoid filesystem inconsistencies in the face of things like power failures. Sincerely, Jason ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 07/27/13 14:58, David Noel wrote: Post the stack trace of the core and maybe someone can help you. panic: ufs_dirrem: Bad link count 2 on parent cpuid = 0 KDB: stack backtrace: #0 0x808680fe at kdb_backtrace+0x5e #1 0x80832cb7 at panic+0x187 #2 0x80a700e3 at ufs_rmdir+0x1c3 #3 0x80b7d484 at VOP_RMDIR_APV+0x34 #4 0x808ca32a at kern_rmdirat+0x21a #5 0x80b17cf0 at amd64_syscall+0x450 #6 0x80b03427 at Xfast_syscall+0xf7 So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
You may want to look into running fsck(8) and its myriad of options fsck did the trick Also make sure you have soft updates enabled on your filesystem and preferably journaled soft updates ..pretty sure I do but I'll double check, thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! Who'd have thought. Do I just post this to the mailing list and hope some benevolent developer stumbles upon it and takes it upon him/herself to fix this, or where do I find the FreeBSD Suggestion Box? I guess I should file a Problem Report and see what happens from there. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 27/07/2013 19:57, David Noel wrote: So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! Who'd have thought. Do I just post this to the mailing list and hope some benevolent developer stumbles upon it and takes it upon him/herself to fix this, or where do I find the FreeBSD Suggestion Box? I guess I should file a Problem Report and see what happens from there. I was going to raise an issue when the discussion had died down to a concensus. I also don't think it's reasonable for the kernel to bomb when it encounters corruption on a disk. If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around line 2791 change: if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); To if (dp-i_effnlink 3) { error = EINVAL; goto out; } The ufs_link() call has a similar issue. I can't see why my mod will break anything, but there's always unintended consequences. By returning invalid argument, any code above it should already be handling that condition although the user will be scratching their head wondering what's wrong with it. Returning ENOENT or EACCES or ENOTDIR may be better (No such directory, Access denied or Not a valid directory). The trouble is that it's tricky to test properly without finding a good way to corrupt the link count :-) Regards, Frank. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
I was going to raise an issue when the discussion had died down to a concensus. I also don't think it's reasonable for the kernel to bomb when it encounters corruption on a disk. If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around line 2791 change: if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); To if (dp-i_effnlink 3) { error = EINVAL; goto out; } The ufs_link() call has a similar issue. I can't see why my mod will break anything, but there's always unintended consequences. By returning invalid argument, any code above it should already be handling that condition although the user will be scratching their head wondering what's wrong with it. Returning ENOENT or EACCES or ENOTDIR may be better (No such directory, Access denied or Not a valid directory). The trouble is that it's tricky to test properly without finding a good way to corrupt the link count :-) Regards, Frank. Cool. Thanks for the patch! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 27/07/2013 20:38, David Noel wrote: I was going to raise an issue when the discussion had died down to a concensus. I also don't think it's reasonable for the kernel to bomb when it encounters corruption on a disk. If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around line 2791 change: if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); To if (dp-i_effnlink 3) { error = EINVAL; goto out; } The ufs_link() call has a similar issue. I can't see why my mod will break anything, but there's always unintended consequences. By returning invalid argument, any code above it should already be handling that condition although the user will be scratching their head wondering what's wrong with it. Returning ENOENT or EACCES or ENOTDIR may be better (No such directory, Access denied or Not a valid directory). The trouble is that it's tricky to test properly without finding a good way to corrupt the link count :-) Regards, Frank. Cool. Thanks for the patch! Sorry - forgot to mention that you use it entirely at your own risk! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On 07/27/13 20:57, David Noel wrote: So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! Who'd have thought. Do I just post this to the mailing list and hope some benevolent developer stumbles upon it and takes it upon him/herself to fix this, or where do I find the FreeBSD Suggestion Box? I guess I should file a Problem Report and see what happens from there. Maybe you could ask on freebsd-fs@. That's the list where the filesystem hackers are hanging around. Basically, from /usr/src/sys/ufs/ufs/ufs_vnops.c:ufs_rmdir(): if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); if (!ufs_dirempty(ip, dp-i_number, cnp-cn_cred)) { error = ENOTEMPTY; goto out; } (...) Basically, the parent directory has less than 3 entries, but since 2 entries are mandatory (. and ..), the 3rd entry that is missing must belong to the directory being removed. This is inconsistent. And if the parent directory is inconsistent, other bad things could happen. The kernel errs on the side of caution, and panic()s instead of silently returning EINVAL. Actually, this is a sensible thing to do in this context. A more robust file system would halt all processes, and perform an in-kernel fsck on the filesystem and its internal (in-memory) structures to repair the damage... and THEN resume the processes. However, this is a major project, and we don't have a self-healing filesystem / kernel (... yet). ;-) -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
Yes. It'd be nice if UFS/FFS would just downgrade things to read-only and not panic. -Adrian ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On Sat, 27 Jul 2013 13:57:31 -0500, David Noel wrote: So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. From your initial description, a _severe_ file system defect seems to be a reasonable assumption. Make sure fsck is run in foreground prior to bringing up the system. The option background_fsck=NO in /etc/rc.conf will make sure you won't encounter this problem again (_if_ it was related to the file system). Always make sure you're booting into a fsck'ed environment. You could also use a S.M.A.R.T. analysis tool such as smartmon (from ports) to make sure the OS didn't panic because of a hard disk defect. I'm just mentioning this because I have sufficient exoerience in this field. :-) Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! It's really surprising that a (comparable) high-level function could fail in that drastic way, but on the other hand, one would assume that there is a _reason_ for this behaviour. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
And here, kids, you can see the strength of open source operating system: You can see _why_ something happens. :-) On Sat, 27 Jul 2013 20:35:09 +0100, Frank Leonhardt wrote: On 27/07/2013 19:57, David Noel wrote: So the system panics in ufs_rmdir(). Maybe the filesystem is corrupt? Have you tried to fsck(8) it manually? fsck worked, though I had to boot from a USB image because I couldn't get into single user.. for some odd reason. Even if the filesystem is corrupt, ufs_rmdir() shouldn't panic(), IMHO, but fail gracefully. Hmmm... Yeah, I was pretty surprised. I think I tried it like 3 times to be sure... and yeah, each time... kaboom! Who'd have thought. Do I just post this to the mailing list and hope some benevolent developer stumbles upon it and takes it upon him/herself to fix this, or where do I find the FreeBSD Suggestion Box? I guess I should file a Problem Report and see what happens from there. I was going to raise an issue when the discussion had died down to a concensus. I also don't think it's reasonable for the kernel to bomb when it encounters corruption on a disk. If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around line 2791 change: if (dp-i_effnlink 3) panic(ufs_dirrem: Bad link count %d on parent, dp-i_effnlink); To if (dp-i_effnlink 3) { error = EINVAL; goto out; } The ufs_link() call has a similar issue. I can't see why my mod will break anything, but there's always unintended consequences. One of the core policies usually is to stop _any_ action that had failed due to a reason that cannot be and make sure it won't get worse. This can be seen for example in fsck's behaviour: If there is a massive file system error that cannot be repaired without further intervention that _could_ destroy data or make its retrieval harder or impossible, the operator will be requested to make the decision. There are options to automate this process, but on the other hand, always assume 'yes' can then be a risk, as it could prevent recovery. My assumtion is that the developers chose a similar approach here: We found a situation that should not be possible, so we stop the system for messing up the file system even more. This carries the attitude of not hiding a problem for the sake of convenience by being silent and going back to the usual work. Of course it is debatable if this is the right decision in _this_ particular case. By returning invalid argument, any code above it should already be handling that condition although the user will be scratching their head wondering what's wrong with it. By determining the inode number and using the fsdb tool internal data about inodes can be examined. Will it also show something that's basically impossible? :-) Returning ENOENT or EACCES or ENOTDIR may be better (No such directory, Access denied or Not a valid directory). Depends on the applying definition of those errors. The trouble is that it's tricky to test properly without finding a good way to corrupt the link count :-) There is a _simple_ way to do this, and I have even mentioned it. Use the fsdb program and manipulate the inode manually. Make sure that you actually understand that _what_ you are doing there is creating severe file system inconsistency errors. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Delete a directory, crash the system
On Sat, 27 Jul 2013 14:57:07 -0700, Adrian Chadd wrote: Yes. It'd be nice if UFS/FFS would just downgrade things to read-only and not panic. That would be possible, but it would confuse programs and users. It's not that you could walk up to the disk drive and flip the write protect switch back... ;-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org