Using event counters with network interfaces, is there a reason they're all ifdefed out of mainline use?

2011-12-10 Thread Brian Buhrow
Hello.  I notice that most, if not all, of the network drivers in
NetBSD have interface counters which they use to track things like
collisions, CRC errors, framing errors, etc.  It looks like these counters,
and the framework for displayng these counters has been in NetBSD for well
over 10 yers, yet, all of these counters are ifdefed out of general use and
hence unavailable to those users who use generic kernels, or who didn't
happen to pursue what those EVENT_COUNTERS ifdefs meant in the various
drivers.  Is there a reason all of these counting facilities are not
enabled by default in GENERIC kernels?  Does using these counters impose such a
performance penalty that general use was deemed too crippling?
I think having these counting facilities available to the general
NetBSD user would be a huge win.  As such, I propose to embark on a project
to enable such counters in GENERIC kernels so that users may view these
extended stats about their network performance.  I'll note that event
counters seem to be enabled in the NetBSD-5.x kernels to do things like
count the number of tlb flushes, ioapic interrupts and the like.  If it
works for those high frequency items, why not enable it for network
drivers?
My thought is to define a generic option, say
ENABLE_INTERFACE_COUNTERS, which would turn on these counters for drivers
which had been tested and were known to work.  Then, for each driver,
enable its counting options and test.  Finally, once local testing was
complete, check in a change for that driver which would hook it to the
general cunting option, and, as a result, add that driver's counting
capabilities to the GENERIC kernel.  Is there a reason this should not be
done?  Are there caveats that I need to be aware of?  Having event stats on
network interfaces would be a huge bonus for NetBSD's usability and, since
it looks like it's almost there, why not make it happen?
thoughts?  Objections?  Encouragement?

-thanks
-Brian


Re: Using event counters with network interfaces, is there a reason they're all ifdefed out of mainline use?

2011-12-10 Thread Izumi Tsutsui

 drivers.  Is there a reason all of these counting facilities are not
 enabled by default in GENERIC kernels?  Does using these counters impose such 
 a
 performance penalty that general use was deemed too crippling?

Do you try any benchmark? (by ttcp(1) etc.)

During mec(4) (on sgimips O2) debugging, enabling evcnts made
network xfer notably slower, but probably it depends on how many
counters the driver has and how often they are called.

I think we need benchmark results per interfaces
rather than blindly enabling counters, because
most ordinary users don't care driver internals
but just visible xfer rates.

---
Izumi Tsutsui


Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread Christos Zoulas
In article 20111209083354.ga2...@lynche.sis.pasteur.fr,
Nicolas Joly  nj...@pasteur.fr wrote:
-=-=-=-=-=-


Hi,

According to the online OpenGroup specification for read(2) available
at [1], read(2) on directories is implementation dependant. If
unsupported, it shall fail with EISDIR.

Not all our file systems comply, and return random errno values in
this case (mostly EINVAL or ENOTSUP).

The attached patch fix some of them (the ones i have access to),
adjust the man page accordingly and add a small testcase to exercize
this.

Is it ok to apply ?
Thanks.

Looks good to me.

christos



Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread David Holland
On Fri, Dec 09, 2011 at 09:33:54AM +0100, Nicolas Joly wrote:
  According to the online OpenGroup specification for read(2) available
  at [1], read(2) on directories is implementation dependant. If
  unsupported, it shall fail with EISDIR.
  
  Not all our file systems comply, and return random errno values in
  this case (mostly EINVAL or ENOTSUP).
  
  The attached patch fix some of them (the ones i have access to),
  adjust the man page accordingly and add a small testcase to exercize
  this.
  
  Is it ok to apply ?

Yes, although I'm wondering if maybe it wouldn't be better to insert a
filesystem-independent check and give up on the old ffs behavior.
After all, application writers have had what, 25 years now? to learn
not to do this.

-- 
David A. Holland
dholl...@netbsd.org


Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread Mouse
 According to the online OpenGroup specification for read(2)
 available at [1], read(2) on directories is implementation
 dependant.  If unsupported, it shall fail with EISDIR.

 Not all our file systems comply, and return random errno values in
 this case (mostly EINVAL or ENOTSUP).

How does that not comply with implementation dependent?  From a
standards-conformance point of view, that's equivalent to in this
implementation, read(2) on directories is supported: on $FILESYSTEM, it
always returns EINVAL, on $OTHER_FILESYSTEM, it works according to
$REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP.

This is not to say that it shouldn't be cleaned up.  Just that I don't
think it's actually nonconformant.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread Steven Bellovin

On Dec 10, 2011, at 12:06 18PM, Mouse wrote:

 According to the online OpenGroup specification for read(2)
 available at [1], read(2) on directories is implementation
 dependant.  If unsupported, it shall fail with EISDIR.
 
 Not all our file systems comply, and return random errno values in
 this case (mostly EINVAL or ENOTSUP).
 
 How does that not comply with implementation dependent?  From a
 standards-conformance point of view, that's equivalent to in this
 implementation, read(2) on directories is supported: on $FILESYSTEM, it
 always returns EINVAL, on $OTHER_FILESYSTEM, it works according to
 $REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP.
 
 This is not to say that it shouldn't be cleaned up.  Just that I don't
 think it's actually nonconformant.


As I red the text, whether or not you support it, and how it behaves
if you do, is up to you, i.e., is implementation-dependent.  However,
if you do not support it, there's a particular error you're supposed
to return: EISDIR.  Arguably, an implementation that sometimes allows
it and sometimes doesn't (NetBSD, depending on the file system in
question) can do what it wants, but I don't think that that will help
application writers.

--Steve Bellovin, https://www.cs.columbia.edu/~smb







Re: Lost file-system story

2011-12-10 Thread Edgar Fuß
My impression is that you are asking for the impossible.

The underlying misconception (which I know very well for suffering from it 
myself) is that a filesystem aims at keeping the on-disc metadata consistent 
and that tools like fsck are intended to rapair any inconsistencies happening 
nontheless.

This, I learned, is not true.

The point of syncronous metadata writes, soft dependency metadata write 
re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the on-disc 
metadata consistent. The sole point is to, under all adverse conditions, leave 
that metadata in a state that can be _put back_ into a consistent state 
(peferrably reflecting an in-memory state not too far back from the time of the 
crash) by fsck, on-mount journal replay or whatever.
That difference becomes perfectly clear with journalling. After an unclean 
shutdown, the on-disc metadata need not be consistent. But the journal enables 
putting it back into a consistent state quite easily.
So fsck is not aimed at and does not claim to be able to recover from random 
inconsistencies in the on-disc metadata. It is aimed at repairing those 
inconsistencies that can occur because a crash _given the metadata was written 
syncronously_.
FreeBSD's background fsck, by the way, is aimed at reparing only those 
inconsistencies that can occur given the metadata was written with softep's 
re-ordering.

Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
performance penalty.

So you seem to be asking for the File System Holy Grail: a file system that is 
as fast as asyncronous metadata writes, yet able to survive any possible kind 
of unclean shutdown. Such a thing, to my knowledge, doesn't exist.


Re: Lost file-system story

2011-12-10 Thread Donald Allen
On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote:
 My impression is that you are asking for the impossible.

 The underlying misconception (which I know very well for suffering from it 
 myself) is that a filesystem aims at keeping the on-disc metadata consistent 
 and that tools like fsck are intended to rapair any inconsistencies happening 
 nontheless.

 This, I learned, is not true.

 The point of syncronous metadata writes, soft dependency metadata write 
 re-ordering, logging/journaling/WAPBL and whatnot is _not_ to keep the 
 on-disc metadata consistent. The sole point is to, under all adverse 
 conditions, leave that metadata in a state that can be _put back_ into a 
 consistent state (peferrably reflecting an in-memory state not too far back 
 from the time of the crash) by fsck, on-mount journal replay or whatever.
 That difference becomes perfectly clear with journalling. After an unclean 
 shutdown, the on-disc metadata need not be consistent. But the journal 
 enables putting it back into a consistent state quite easily.
 So fsck is not aimed at and does not claim to be able to recover from random 
 inconsistencies in the on-disc metadata. It is aimed at repairing those 
 inconsistencies that can occur because a crash _given the metadata was 
 written syncronously_.
 FreeBSD's background fsck, by the way, is aimed at reparing only those 
 inconsistencies that can occur given the metadata was written with softep's 
 re-ordering.

 Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
 performance penalty.

 So you seem to be asking for the File System Holy Grail: a file system that 
 is as fast as asyncronous metadata writes, yet able to survive any possible 
 kind of unclean shutdown. Such a thing, to my knowledge, doesn't exist.

I'm sorry, I don't wish to be rude, but you, too, seem not to have
read what I've written carefully. Or, perhaps the fault is mine, that
I simply haven't made myself sufficiently clear. I've talked at length
about the behavior of Linux ext2 and that it was more than acceptable,
both from a standpoint of performance and reliability. I am not
looking for something able to survive any possible kind of unclean
shutdown. I'm looking for a reasonably low joint probability of a
crash occurring *and* losing an async-mounted filesystem as a result.
I simply want an async implementation where the benefit (performance)
is not out-weighed by the risk (lost filesystems) and I cited Linux
ext2 is an example of that. If that's not clear to you, then I'm
afraid I can't do better.


Re: Lost file-system story

2011-12-10 Thread Donald Allen
On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow
buh...@lothlorien.nfbcal.org wrote:
        Hello.  Just for your edification, it is possible to break out of fsck
 mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
 own.

This whole discussion, interesting though it may be, may have occurred
simply because of my unfamiliarity with NetBSD and probably a mistake
in not looking at the fsck man page for something like the -y option
when I reached the point where continuing to feed 'y's to fsck after
the original crash seemed like a losing battle. Had I thought about -y
(I know that fscks typically have such an option, but in my experience
it's an optional answer to fsck questions, as OpenBSD's is; for
whatever reason, I didn't think of it), I'd have used it, since I had
nothing to lose at that point. But it's possible you have put your
finger on the real truth of what happened here. Read on.

You suggested trying the experiment I did with OpenBSD with NetBSD,
and so I did. Twice. I installed NetBSD with separate directories for
/, /usr, /var, /tmp, and /home, ala OpenBSD's default setup. All,
except /home and /tmp were mounted softdef,noatime. /home was mounted
async, and /tmp is an in-memory filesystem. The first time, I untarred
the OpenBSD ports.tar.gz (I used it because it was what I used in the
OpenBSD test, it's big, and I had it lying around) into a temporary
directory in my home directory. With the battery removed from the
laptop, I did an

rm -rf ports

and while that was happening, I pulled the power connector.

On restart, fsck found a bunch of things it didn't like about the
/home filesystem, but managed to fix things up to its satisfaction and
declare the filesystem clean. My home directory survived this and,
like OpenBSD, a fair amount of the ports directory was still present.
I then removed it and re-did the untar, while the untar was happening,
I again pulled the plug. This time, the automatic fsck got unhappy
enough to drop me into single-user mode and ran fsck there manually. I
again encountered a seemingly never-ending sequence of requests to fix
this and that. So I aborted and used the -y option. It charged through
a bunch of trouble spots and completed. On reboot, I found the same
situation as the first one -- home directory intact and some of the
ports directory present.

I have a some thoughts about this:

1. Had I run fsck -y at the time of the first crash, I might well have
found what I found today -- a repaired filesystem that was usable. So
my assertion that the filesystem was lost may well have simply been my
lack of skill as a NetBSD sys-admin.
2. Today's experiment shows that a NetBSD ffs filesystem mounted
async, together with its fsck, *is* capable of surviving even a pretty
brutal improper shutdown -- loss of power while a lot of writing was
happening. Obviously I still don't have enough data to know if the
probability of survival is comparable to Linux ext2, but what I found
today is at least encouraging.

I did one more experiment, and that was to untar the ports tarball,
and then waited about a minute. I then did a sync. The disk light
blinked just for a brief moment. This is a *big* tar file, but it
appears from this easy little test that there was not a huge amount of
dirty stuff sitting in the buffer cache. This is obviously not
definitive, but does suggest that NetBSD is migrating stuff from the
buffer cache back to the disk for async-mounted filesystems in timely
fashion. A look at the code is probably the final arbiter here. I also
note that there are sysctl items, such as vfs.sync.metadelay that I
would like to understand.

/Don Allen


Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread Nicolas Joly
On Sat, Dec 10, 2011 at 12:06:18PM -0500, Mouse wrote:
  According to the online OpenGroup specification for read(2)
  available at [1], read(2) on directories is implementation
  dependant.  If unsupported, it shall fail with EISDIR.
 
  Not all our file systems comply, and return random errno values in
  this case (mostly EINVAL or ENOTSUP).
 
 How does that not comply with implementation dependent?  From a
 standards-conformance point of view, that's equivalent to in this
 implementation, read(2) on directories is supported: on $FILESYSTEM, it
 always returns EINVAL, on $OTHER_FILESYSTEM, it works according to
 $REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP.
 
 This is not to say that it shouldn't be cleaned up.  Just that I don't
 think it's actually nonconformant.

Actually, it's filesystem implementation dependant to allow read(2) on
directories; and if the implementation does not support it, it should
fail with EISDIR.

-- 
Nicolas Joly

Projects and Developments in Bioinformatics
Institut Pasteur, Paris.


Re: Lost file-system story

2011-12-10 Thread Aleksej Saushev
Donald Allen donaldcal...@gmail.com writes:

 On Sat, Dec 10, 2011 at 1:14 PM, Edgar Fuß e...@math.uni-bonn.de wrote:
 Of course, keeping the on-disc metadata in a ``repairable'' state incurs a 
 performance penalty.

 So you seem to be asking for the File System Holy Grail: a file
 system that is as fast as asyncronous metadata writes, yet able to
 survive any possible kind of unclean shutdown. Such a thing, to my
 knowledge, doesn't exist.

 I'm sorry, I don't wish to be rude, but you, too, seem not to have
 read what I've written carefully. Or, perhaps the fault is mine, that
 I simply haven't made myself sufficiently clear. I've talked at length
 about the behavior of Linux ext2 and that it was more than acceptable,
 both from a standpoint of performance and reliability. I am not
 looking for something able to survive any possible kind of unclean
 shutdown. I'm looking for a reasonably low joint probability of a
 crash occurring *and* losing an async-mounted filesystem as a result.
 I simply want an async implementation where the benefit (performance)
 is not out-weighed by the risk (lost filesystems) and I cited Linux
 ext2 is an example of that. If that's not clear to you, then I'm
 afraid I can't do better.

I think that it should be clear that async mount excludes what you want.
Async mount basically means that you create fresh file system after boot.
In linux it may mean another thing (e.g., it may be less asynchronous),
in BSDs it means exactly that. Thus, unless you really can afford
starting file system from scratch, don't mount it async.


-- 
HE CE3OH...