Using event counters with network interfaces, is there a reason they're all ifdefed out of mainline use?
Hello. I notice that most, if not all, of the network drivers in NetBSD have interface counters which they use to track things like collisions, CRC errors, framing errors, etc. It looks like these counters, and the framework for displayng these counters has been in NetBSD for well over 10 yers, yet, all of these counters are ifdefed out of general use and hence unavailable to those users who use generic kernels, or who didn't happen to pursue what those EVENT_COUNTERS ifdefs meant in the various drivers. Is there a reason all of these counting facilities are not enabled by default in GENERIC kernels? Does using these counters impose such a performance penalty that general use was deemed too crippling? I think having these counting facilities available to the general NetBSD user would be a huge win. As such, I propose to embark on a project to enable such counters in GENERIC kernels so that users may view these extended stats about their network performance. I'll note that event counters seem to be enabled in the NetBSD-5.x kernels to do things like count the number of tlb flushes, ioapic interrupts and the like. If it works for those high frequency items, why not enable it for network drivers? My thought is to define a generic option, say ENABLE_INTERFACE_COUNTERS, which would turn on these counters for drivers which had been tested and were known to work. Then, for each driver, enable its counting options and test. Finally, once local testing was complete, check in a change for that driver which would hook it to the general cunting option, and, as a result, add that driver's counting capabilities to the GENERIC kernel. Is there a reason this should not be done? Are there caveats that I need to be aware of? Having event stats on network interfaces would be a huge bonus for NetBSD's usability and, since it looks like it's almost there, why not make it happen? thoughts? Objections? Encouragement? -thanks -Brian
Re: Using event counters with network interfaces, is there a reason they're all ifdefed out of mainline use?
drivers. Is there a reason all of these counting facilities are not enabled by default in GENERIC kernels? Does using these counters impose such a performance penalty that general use was deemed too crippling? Do you try any benchmark? (by ttcp(1) etc.) During mec(4) (on sgimips O2) debugging, enabling evcnts made network xfer notably slower, but probably it depends on how many counters the driver has and how often they are called. I think we need benchmark results per interfaces rather than blindly enabling counters, because most ordinary users don't care driver internals but just visible xfer rates. --- Izumi Tsutsui
Re: Use consistent errno for read(2) failure on directories
In article 20111209083354.ga2...@lynche.sis.pasteur.fr, Nicolas Joly nj...@pasteur.fr wrote: -=-=-=-=-=- Hi, According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. Not all our file systems comply, and return random errno values in this case (mostly EINVAL or ENOTSUP). The attached patch fix some of them (the ones i have access to), adjust the man page accordingly and add a small testcase to exercize this. Is it ok to apply ? Thanks. Looks good to me. christos
Re: Use consistent errno for read(2) failure on directories
On Fri, Dec 09, 2011 at 09:33:54AM +0100, Nicolas Joly wrote: According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. Not all our file systems comply, and return random errno values in this case (mostly EINVAL or ENOTSUP). The attached patch fix some of them (the ones i have access to), adjust the man page accordingly and add a small testcase to exercize this. Is it ok to apply ? Yes, although I'm wondering if maybe it wouldn't be better to insert a filesystem-independent check and give up on the old ffs behavior. After all, application writers have had what, 25 years now? to learn not to do this. -- David A. Holland dholl...@netbsd.org
Re: Use consistent errno for read(2) failure on directories
According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. Not all our file systems comply, and return random errno values in this case (mostly EINVAL or ENOTSUP). How does that not comply with implementation dependent? From a standards-conformance point of view, that's equivalent to in this implementation, read(2) on directories is supported: on $FILESYSTEM, it always returns EINVAL, on $OTHER_FILESYSTEM, it works according to $REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP. This is not to say that it shouldn't be cleaned up. Just that I don't think it's actually nonconformant. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Use consistent errno for read(2) failure on directories
On Dec 10, 2011, at 12:06 18PM, Mouse wrote: According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. Not all our file systems comply, and return random errno values in this case (mostly EINVAL or ENOTSUP). How does that not comply with implementation dependent? From a standards-conformance point of view, that's equivalent to in this implementation, read(2) on directories is supported: on $FILESYSTEM, it always returns EINVAL, on $OTHER_FILESYSTEM, it works according to $REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP. This is not to say that it shouldn't be cleaned up. Just that I don't think it's actually nonconformant. As I red the text, whether or not you support it, and how it behaves if you do, is up to you, i.e., is implementation-dependent. However, if you do not support it, there's a particular error you're supposed to return: EISDIR. Arguably, an implementation that sometimes allows it and sometimes doesn't (NetBSD, depending on the file system in question) can do what it wants, but I don't think that that will help application writers. --Steve Bellovin, https://www.cs.columbia.edu/~smb
Re: Lost file-system story
My impression is that you are asking for the impossible. The underlying misconception (which I know very well for suffering from it myself) is that a filesystem aims at keeping the on-disc metadata consistent and that tools like fsck are intended to rapair any inconsistencies happening nontheless. This, I learned, is not true. The point of syncronous metadata writes, soft dependency metadata write re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the on-disc metadata consistent. The sole point is to, under all adverse conditions, leave that metadata in a state that can be _put back_ into a consistent state (peferrably reflecting an in-memory state not too far back from the time of the crash) by fsck, on-mount journal replay or whatever. That difference becomes perfectly clear with journalling. After an unclean shutdown, the on-disc metadata need not be consistent. But the journal enables putting it back into a consistent state quite easily. So fsck is not aimed at and does not claim to be able to recover from random inconsistencies in the on-disc metadata. It is aimed at repairing those inconsistencies that can occur because a crash _given the metadata was written syncronously_. FreeBSD's background fsck, by the way, is aimed at reparing only those inconsistencies that can occur given the metadata was written with softep's re-ordering. Of course, keeping the on-disc metadata in a ``repairable'' state incurs a performance penalty. So you seem to be asking for the File System Holy Grail: a file system that is as fast as asyncronous metadata writes, yet able to survive any possible kind of unclean shutdown. Such a thing, to my knowledge, doesn't exist.
Re: Lost file-system story
On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote: My impression is that you are asking for the impossible. The underlying misconception (which I know very well for suffering from it myself) is that a filesystem aims at keeping the on-disc metadata consistent and that tools like fsck are intended to rapair any inconsistencies happening nontheless. This, I learned, is not true. The point of syncronous metadata writes, soft dependency metadata write re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the on-disc metadata consistent. The sole point is to, under all adverse conditions, leave that metadata in a state that can be _put back_ into a consistent state (peferrably reflecting an in-memory state not too far back from the time of the crash) by fsck, on-mount journal replay or whatever. That difference becomes perfectly clear with journalling. After an unclean shutdown, the on-disc metadata need not be consistent. But the journal enables putting it back into a consistent state quite easily. So fsck is not aimed at and does not claim to be able to recover from random inconsistencies in the on-disc metadata. It is aimed at repairing those inconsistencies that can occur because a crash _given the metadata was written syncronously_. FreeBSD's background fsck, by the way, is aimed at reparing only those inconsistencies that can occur given the metadata was written with softep's re-ordering. Of course, keeping the on-disc metadata in a ``repairable'' state incurs a performance penalty. So you seem to be asking for the File System Holy Grail: a file system that is as fast as asyncronous metadata writes, yet able to survive any possible kind of unclean shutdown. Such a thing, to my knowledge, doesn't exist. I'm sorry, I don't wish to be rude, but you, too, seem not to have read what I've written carefully. Or, perhaps the fault is mine, that I simply haven't made myself sufficiently clear. I've talked at length about the behavior of Linux ext2 and that it was more than acceptable, both from a standpoint of performance and reliability. I am not looking for something able to survive any possible kind of unclean shutdown. I'm looking for a reasonably low joint probability of a crash occurring *and* losing an async-mounted filesystem as a result. I simply want an async implementation where the benefit (performance) is not out-weighed by the risk (lost filesystems) and I cited Linux ext2 is an example of that. If that's not clear to you, then I'm afraid I can't do better.
Re: Lost file-system story
On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow buh...@lothlorien.nfbcal.org wrote: Hello. Just for your edification, it is possible to break out of fsck mid-way and reinvoke it with fsck -y to get it to do the cleaning on its own. This whole discussion, interesting though it may be, may have occurred simply because of my unfamiliarity with NetBSD and probably a mistake in not looking at the fsck man page for something like the -y option when I reached the point where continuing to feed 'y's to fsck after the original crash seemed like a losing battle. Had I thought about -y (I know that fscks typically have such an option, but in my experience it's an optional answer to fsck questions, as OpenBSD's is; for whatever reason, I didn't think of it), I'd have used it, since I had nothing to lose at that point. But it's possible you have put your finger on the real truth of what happened here. Read on. You suggested trying the experiment I did with OpenBSD with NetBSD, and so I did. Twice. I installed NetBSD with separate directories for /, /usr, /var, /tmp, and /home, ala OpenBSD's default setup. All, except /home and /tmp were mounted softdef,noatime. /home was mounted async, and /tmp is an in-memory filesystem. The first time, I untarred the OpenBSD ports.tar.gz (I used it because it was what I used in the OpenBSD test, it's big, and I had it lying around) into a temporary directory in my home directory. With the battery removed from the laptop, I did an rm -rf ports and while that was happening, I pulled the power connector. On restart, fsck found a bunch of things it didn't like about the /home filesystem, but managed to fix things up to its satisfaction and declare the filesystem clean. My home directory survived this and, like OpenBSD, a fair amount of the ports directory was still present. I then removed it and re-did the untar, while the untar was happening, I again pulled the plug. This time, the automatic fsck got unhappy enough to drop me into single-user mode and ran fsck there manually. I again encountered a seemingly never-ending sequence of requests to fix this and that. So I aborted and used the -y option. It charged through a bunch of trouble spots and completed. On reboot, I found the same situation as the first one -- home directory intact and some of the ports directory present. I have a some thoughts about this: 1. Had I run fsck -y at the time of the first crash, I might well have found what I found today -- a repaired filesystem that was usable. So my assertion that the filesystem was lost may well have simply been my lack of skill as a NetBSD sys-admin. 2. Today's experiment shows that a NetBSD ffs filesystem mounted async, together with its fsck, *is* capable of surviving even a pretty brutal improper shutdown -- loss of power while a lot of writing was happening. Obviously I still don't have enough data to know if the probability of survival is comparable to Linux ext2, but what I found today is at least encouraging. I did one more experiment, and that was to untar the ports tarball, and then waited about a minute. I then did a sync. The disk light blinked just for a brief moment. This is a *big* tar file, but it appears from this easy little test that there was not a huge amount of dirty stuff sitting in the buffer cache. This is obviously not definitive, but does suggest that NetBSD is migrating stuff from the buffer cache back to the disk for async-mounted filesystems in timely fashion. A look at the code is probably the final arbiter here. I also note that there are sysctl items, such as vfs.sync.metadelay that I would like to understand. /Don Allen
Re: Use consistent errno for read(2) failure on directories
On Sat, Dec 10, 2011 at 12:06:18PM -0500, Mouse wrote: According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. Not all our file systems comply, and return random errno values in this case (mostly EINVAL or ENOTSUP). How does that not comply with implementation dependent? From a standards-conformance point of view, that's equivalent to in this implementation, read(2) on directories is supported: on $FILESYSTEM, it always returns EINVAL, on $OTHER_FILESYSTEM, it works according to $REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP. This is not to say that it shouldn't be cleaned up. Just that I don't think it's actually nonconformant. Actually, it's filesystem implementation dependant to allow read(2) on directories; and if the implementation does not support it, it should fail with EISDIR. -- Nicolas Joly Projects and Developments in Bioinformatics Institut Pasteur, Paris.
Re: Lost file-system story
Donald Allen donaldcal...@gmail.com writes: On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote: Of course, keeping the on-disc metadata in a ``repairable'' state incurs a performance penalty. So you seem to be asking for the File System Holy Grail: a file system that is as fast as asyncronous metadata writes, yet able to survive any possible kind of unclean shutdown. Such a thing, to my knowledge, doesn't exist. I'm sorry, I don't wish to be rude, but you, too, seem not to have read what I've written carefully. Or, perhaps the fault is mine, that I simply haven't made myself sufficiently clear. I've talked at length about the behavior of Linux ext2 and that it was more than acceptable, both from a standpoint of performance and reliability. I am not looking for something able to survive any possible kind of unclean shutdown. I'm looking for a reasonably low joint probability of a crash occurring *and* losing an async-mounted filesystem as a result. I simply want an async implementation where the benefit (performance) is not out-weighed by the risk (lost filesystems) and I cited Linux ext2 is an example of that. If that's not clear to you, then I'm afraid I can't do better. I think that it should be clear that async mount excludes what you want. Async mount basically means that you create fresh file system after boot. In linux it may mean another thing (e.g., it may be less asynchronous), in BSDs it means exactly that. Thus, unless you really can afford starting file system from scratch, don't mount it async. -- HE CE3OH...