Re: 4.0.1 i386 wedged

2013-05-22 Thread der Mouse
>> I was under the impression that vnconfigging an nfs-mounted file
>> continues to not work to today -- [...]
> It _used_ to work - in recent years I did a recovery of a linux lvm
> backup by nfs mounting the storage that had the file created by
> dd'ing the linux disk, doing a vnconfig and then a lvm change to get
> at the linux file systems.

Oh, it worked for me too - for a while.  It didn't wedge until, I
suspect, the system came under memory pressure.  If I'd written an
amount that's small compared to available RAM before flushing and
unmounting, I very well might not have noticed any problem at all.

Also, your use case was read-mostly, sounds like.  Mine was not.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 4.0.1 i386 wedged

2013-05-21 Thread der Mouse
> I was under the impression that vnconfigging an nfs-mounted file
> continues to not work to today -- [...].  In other words, "Don't do
> that".

Okay.  I juggled things around a bit, so the vnconfig and mount were
done on the NFS server instead, with the vnd mount point exported.
Then, while adding debugging code to the stuff running on the NFS
client, I found that I no longer needed to do it at all. :-)

So it's not a practical issue for me now.  It might turn into one at
some as-yet-indeterminate future point, but I'll deal with that
if-and-when it happens.

Thanks for the note!

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


4.0.1 i386 wedged

2013-05-21 Thread der Mouse
I don't know if anyone recalls enough 4.x to say anything useful here,
but if anyone does and cares to comment

The machine: 4.0.1 i386.  Two CPUs.  (Kernel has MULTIPROCESSOR,
MPBIOS, and APM_NO_IDLE turned on.)

NFS-mount a filesystem with a big (half-terabyte) file in it.  vnconfig
that onto vnd0.  Mount /dev/vnd0d.  Write stuff to it.

Machine locks up.  Responsive to ping, but userland is totally wedged,
doesn't even respond to RETURN on the console.  Break into ddb and do
ps and find pagedaemon is waiting on emergva.  It appears to be
deadlocked against itself; my impression from the stack trace is that
it's trying to page something out and finds itself wanting to page
something in to do so.

Is this a case of "don't do that, then", or should this work and I just
need to track down a bug?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-27 Thread der Mouse
>> That "what it is reasonable for a disk to do" consensus *is* the
>> interface spec I was talking about, not the de-jure non-spec of "you
>> get whatever the device (via its driver) feels like giving you".

> That's sort of the point.  If you want "what it is reasonable for a
> disk to do" you should be using the block device [...].

> The raw device is supposed to be just that: a raw interface to the
> device.  It gives you access to all the mis-behavior of the device
> with all its gory niggling little details.

That's a nice theory.  But it's not the historical practice.

Typically raw device drivers do a nontrivial amount of cleaning up of
the hardware's interface, such as dealing with bounce buffers and
poking device registers.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-25 Thread der Mouse
>> This would mean that raw devices as interfaces to disks are
>> essentially useless.
> Not at all, as history has proven, as that's what the rule has always
> been.

Only because

> [I]n the real world, the manufacturers don't make products that they
> can't sell, and people don't buy products that don't work (or not
> very many of them), so anything that you're (likely to) buy is almost
> certainly going to be reasonably compatible with what has gone
> before,

That "what it is reasonable for a disk to do" consensus *is* the
interface spec I was talking about, not the de-jure non-spec of "you
get whatever the device (via its driver) feels like giving you".

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-24 Thread der Mouse
>> [...], there's a lot of history behind the notion that userland
>> alignment of write() buffers affects, at most, performance, to the
>> point where I consider it part of the interface.
> Not on access to raw devices it isn't, and never was - what Erik Fair
> said [...] was 100% correct - if you're using a raw device, it is up
> to the application to meet whatever the requirements of that
> particular device are, [...]

This would mean that raw devices as interfaces to disks are essentially
useless.  It becomes impossible to write _anything_ that works on "raw
disks", because you don't know what restrictions might be demanded by
the next disk device to come along.  Indeed, it's entirely possible
that two devices might make mutually incompatible demands, making it
impossible to support both of them with the same code.

Do we really want fsck_ffs_sd, fsck_ffs_xy, fsck_ffs_ld, fsck_ffs_wd,
etc?  That's where this is going.  I think it would be a major step
backwards philosophically and would definitely be a major step
backwards in practice.

And, for the specific case that started this off, that's not what's
going on; it does write 64K of the 4M, so it clearly doesn't mind the
alignment.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-24 Thread der Mouse
>> Oh, I was talking about current NetBSD where block devices are a
>> second class citizen, soon to be abolished if someone finds enough
>> round tuits.
> Yes, so it keeps being said.  It would truly be a pity to see that
> happen.

Why?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-22 Thread der Mouse
> You have to use multiple of the sector size here (usually 512 bytes,
> but can be 4k now) so I don't find it unreasonable to also expect the
> buffer to be aligned on the machine's natural integer size.

If the interface were being designed now, I wouldn't either.

But the interface is much older than that, and, even if it's not
codified, there's a lot of history behind the notion that userland
alignment of write() buffers affects, at most, performance, to the
point where I consider it part of the interface.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-22 Thread der Mouse
>> [...disk buffer alignment issues...]
> I suspect most disk controllers will have issues if the buffer is not
> aligned on at [least] 16bits.

Perhaps.  In such cases, something else in the I/O stack - the device
driver, most likely, since this is a device-hardware issue - has to
compensate.  Paying an extra copy penalty for a misaligned buffer may
be annoying, but it's better than getting I/O errors or other weird
violation of the interface's semantics.

The issue for me is not that the hardware does or doesn't have
alignment restrictions.  It's that they show through to userland (and
in a very peculiar way).  As someone mentioned upthread, it's possible
what's going on is that this hardware has alignment issues (at least
when used with our sequencer program) the driver _doesn't_ deal with.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-21 Thread der Mouse
>>> It's a 53c8xx:  we load the "firmware" into it from our own driver.
>> I just had a look at esiop.ss, and it looks high-level enough that
>> it's plausible to me that the problem is with whatever's backing
>> that code, whether silicon or ROMed firmware or whatever.
> I just verified that I see the same on a 53c875 using esiop(4), and
> it also does the same thing with siop(4).

That's good news, in that it means the problem is probably relatively
well-contained and probably common to all chips of a specific type,
possibly a family of closely related types.  This makes it more likely
it'll get found and fixed. :)

> I also found that it works fine if the buffer is at an even address
> and only failed on odd addresses.

That's not what I saw; my initial failure was with a buffer that was
aligned to, IIRC, a multiple of 32 bytes (I think the address in hex
ended in a0).  The test program I quoted uses an odd address for the
misaligned test - I wanted it to be as misaligned as possible - but, at
least on the hardware I initially saw this on, that's not necessary.
(It may help, though.)

The SCSI geek I mentioned wrote back saying this could be caused by
bugs in the sequencer program; he pointed me at the Linux sequencer
program, which I'll compare to NetBSD's esiop.ss and see if I can see
anything useful there.  (He's worked with this stuff mostly on Linux,
so that's what he knows.  Come to think of it, I wonder if I can find a
Linux livecd to try my test on this hardware under the penguin.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-21 Thread der Mouse
> It's a 53c8xx:  we load the "firmware" into it from our own driver.

I just had a look at esiop.ss, and it looks high-level enough that it's
plausible to me that the problem is with whatever's backing that code,
whether silicon or ROMed firmware or whatever.

I've sent an email to the relevant person; we'll see what, if anything,
it leads to.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: write alignment matters?

2011-06-21 Thread der Mouse
>> It does, however, appear to have something to do with the hardware
>> [...]
> It's most likely the controller hardware (or firmware).  The driver
> doesn't seem at all interested in the data alignment, and I don't
> recall other parts of the I/O stack caring.

That sounds reasonable.

> To verify you should test your program against some other (preferably
> not emulated) disk controller.

I think I have an add-in card that also shows up as esiop.  That could
help me tell whether it's the hardware or something weird in the
driver.

Actually, I got the machine from a friend-of-a-friend who was a pretty
hardcore SCSI geek in a former life (he now repairs organs, the musical
kind).  He would probably be the right person to ask about this; it's
not totally implausible he might even have source to the firmware (I
doubt he could release it to me if so, but he could look at it).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


write alignment matters?

2011-06-20 Thread der Mouse
I've just run into something (on 4.0.1) which looks to me as though
write() buffer alignment matters.  This sounds to me like a bug, but it
appears to have something to do with the hardware, and I'd appreciate
any thoughts on how I might best track it down.

I tried to run one of my tools (a disk verifier which, as I ran it,
just writes distinctive data to a disk).  I ran it as

disk-check -m sd1d -w /dev/rsd1d

which writes stuff to /dev/rsd1d.  I got messages indicating write
errors.  This seemed weird enough I started investigating further.
ktrace indicated that disk-check was doing

  1019  1 disk-check CALL  open(0xbfbfeb8a,1,0)
  1019  1 disk-check NAMI  "/dev/rsd1d"
  1019  1 disk-check RET   open 3
  1019  1 disk-check CALL  __fstat30(3,0xbfbfe998)
  1019  1 disk-check RET   __fstat30 0
  1019  1 disk-check CALL  ioctl(3,DIOCGDINFO,0xbfbfe804)
  1019  1 disk-check GIO   fd 3 read 404 bytes
[actual data snipped for brevity]
  1019  1 disk-check RET   ioctl 0
  1019  1 disk-check CALL  
__sigaction_sigtramp(SIGINFO,0xbfbfe970,0,0xbbb730c4,2)
  1019  1 disk-check RET   __sigaction_sigtramp 0
  1019  1 disk-check CALL  lseek(3,0,0,0,0)
  1019  1 disk-check RET   lseek 0
  1019  1 disk-check CALL  write(3,0x804baa0,0x40)
  1019  1 disk-check GIO   fd 3 wrote 0 bytes
   ""
  1019  1 disk-check RET   write 0

which of course isn't right.  I tried with dd, which worked; the only
difference I could see in kdump output that looked even barely possibly
significant was that dd's buffer was aligned to a multiple of 4K.  So I
taught disk-check to align its write buffer and the weird write
behaviour vanished.

Sometimes, depending on I'm not sure what, the write writes 64K of data
(but then an attempt to write 4128768 bytes from a buffer address 64K
further advanced usually returns 0 immediately).

It does not appear to be related to any of my kernel hacks; at least,
when I boot the stock distribution GENERIC kernel, I see the same
(mis)behaviour from my test program (below) as under my kernel.

It does, however, appear to have something to do with the hardware
(personally, I suspect the disk driver); if I try it on another 4.0.1
machine on vnd0d backed by an ordinary file, it doesn't misbehave, and
if I try it on a real disk partition on that machine (but a
non-RAW_PART partition on a wd drive, rather than sd1d), it doesn't
misbehave either.

I wrote a small test program.  I'd be interested to hear if anyone else
can get it to misbehave, or if anyone has any suggestions for what
could possibly be behind this.

The hardware on which this misbehaves is an HP NetServer LP 1000r; the
disk is  in an external enclosure, on
the built-in SCSI interface.  Full dmesg is below.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Here's that test program, and the results I see:

# cat z.c
#include 
#include 
#include 
#include 

#define MISALIGN 2321
#define BUFSIZE (1<<22)
#define ALIGN 4096

static unsigned char dbuf[BUFSIZE+ALIGN+MISALIGN];

int main(void);
int main(void)
{
 unsigned char *dp_align;
 unsigned char *dp_misalign;
 int i;

 for (i=0;i /dev/rsd1d
aligned write: 4194304
misaligned write: 65535
# 

And the dmesg I promised (this is from my kernel, but, as I said, the
stock 4.0.1 GENERIC on this same hardware misbehaves the same way,
though I need to boot -c and "disable acpi" to get it to boot at all):

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 4.0.1 (NETSERVER) #0: Mon Jun 20 03:50:52 EDT 2011
mouse@:/home/mouse/kbuild/NETSERVER
total memory = 1023 MB
rbus: rbus_min_start set to 0x4000
avail memory = 995 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xfd8e2
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (HP   LP 1Kr/2Kr  )
cpu0 at mainbus0: apid 3 (boot processor)
cpu0: Intel Pentium III (686-class), 1266.79 MHz, id 0x6b1
cpu0: features 383fbff
cpu0: features 383fbff
cpu0: features 383fbff
cpu0: "Intel(R) Pentium(R) III CPU family  1266MHz"
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 512 KB 32B/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: calibrating local timer
cpu0: apic clock running at 133 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 0 (application processor)
cpu1: starting
cpu1: Intel Pentium III (686-class), 1266.71 MHz, id 0x6b1
cpu1: features 383fbff
cpu1: features 383fbff
cpu1: features 383fbff
cpu1: "Intel(R) P

Re: Silly question about ktrace(1) and non-root users

2011-06-20 Thread der Mouse
> %ps -uw28755
> USER PID %CPU %MEM VSZ RSS TTY STAT STARTEDTIME COMMAND
> buhrow 28755  0.1  0.0 408 932 ?   S24May11 0:03.27 sshd: buhrow@ttyp2 
> %whoami
> buhrow
> %ktrace -p 28755
> ktrace: file ktrace.out, pid 28755: Operation not permitted

See sys/kern/kern_ktrace.c.  This is probably coming from either the
filesystem code in sys_ktrace(), if the problem is trying to open the
file, or from kauth_authorize_process(KAUTH_PROCESS_CANKTRACE) in
ktrcanset(), if the problem is coming from permission to trace the
process.

I'd check the former first, because it's easier.  But if that's not it,
look at sys/secmodel/bsd44/secmodel_bsd44_suser.c for
KAUTH_PROCESS_CANKTRACE (I'm assuming you're not using your own
secmodels here, or you surely would have mentioned it).  My guess would
be that this is a case of a set-ID program doing a setuid() but not
execing and the process thus still being marked as set-ID.  If you
really want to track this down, you could do something like sprinkle
printfs in the KAUTH_PROCESS_CANKTRACE code to find out which test is
responsible.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: psaux driver

2011-06-07 Thread der Mouse
> a half a year ago I wrote a simple psaux driver which exports raw ps
> port to userspace in the way in which Linux used to do with a psaux
> driver.

> I need it in order to handle my synaptics touchpad correctly using
> the available userspace X driver.

I dunno about X drivers, but I addressed the desire to get at my
Synaptics directly a bit differently: I made dev/pckbport/synaptics.c
also present a character-device interface (actually, I added
.../synapticsdev.c with .../synaptics.c having just minimal hooks).
This interface can either copy events on their way to wscons or steal
them entirely so wscons never sees them, depending on how it's opened
(which minor device); it returns raw Synaptics 6-byte packets.

I'm not sure if this is operationally equivalent to what you have, but
it might be of some interest.  Currently, the only ways it's available
are either (1) directly from me or (2) via my gitification of my
changes to 4.0.1; git clone
git://ftp.rodents-montreal.org/Mouse/netbsd-fork and look at commit
66383ba7adc6a6507f6d194661820e5b07e48e1b.  (You'll need a gig or so of
space if you want a checked-out tree and something like a third of a
gig for git's stuff.)

> The disadvantage is that it bypasses the wscons layer, so it is
> inherently not elegant.

Well, so does mine.  But then, wscons is rather Procrustean; it tends
to pare away capabilities until what's left fits the wscons model.
This, of course, has a good side and a bad side, but it does mean that
I don't mind end-running around wscons to get at capabilities its model
doesn't support.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RAID stripe size (was: 5.1 RAID5 write performance)

2011-06-06 Thread der Mouse
> So, what's the advantage of a larger sectPerSU?

Larger is not necessarily better.  Larger than the typical write size
(which is usually the filesystem block or frag size) is actually a
*dis*advantage, because it means that common writes force RMW cycles.

Much smaller than the typical access size is a different disadvantage.
In particular, accesses larger than sectPerSU force additional drives
to pay seek, rotational latency, and contention penalties.  (This
applies to both read and write, though in slightly different ways.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: sleeping instead of ENOBUFS on write to socket?

2011-05-27 Thread der Mouse
> I'm not sure what happens for blocking datagram sockets.

The historical practice is that they're "best effort": writing never
blocks.  Some conditions responsible for packet drop are reflected in
the write()/send()/etc return value; others aren't - though I think for
AF_LOCAL they're all the former.  Blocking on SOCK_DGRAM matters only
when trying to read when there's nothing there.

Is that how it _should_ be?  That's debatable.  But if you make
SOCK_DGRAM writes block without some kind of explicit request (like a
setsockopt), you will break a fair bit of code.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: sleeping instead of ENOBUFS on write to socket?

2011-05-27 Thread der Mouse
> [...], as local SOCK_DGRAM sockets will return ENOBUFS when kernel is
> out of buffer memory.  As I understand, there is no way to sleeep
> instead.

I've never investigated it in detail, but my understanding is that
that's the usual semantic for SOCK_DGRAM.

> I thought this was specific to SOCK_DGRAM, since SOCK_DGRAM is not
> supposed to be reliable.  I therefore added support for
> SOCK_SEQPACKET (SOCK_STREAM is not relevant, I need atomicity) in the
> kernel, but I face the very same problem, except that SOCK_SEQPACKET
> local sockets will return EMSGSIZE when kernel is out of buffer
> memory.

Does SOCK_SEQPACKET even work?  socket(2) on a handy 4.0.1 machine says
"presently implemented only for PF_NS"; someone must have added it to
AF_LOCAL, either after 4.0.1 or without updating socket(2).  Neat.

My reading is that SOCK_SEQPACKET is supposed to sleep: "...sequenced,
reliable, two-way connection-based data transmission path for
datagrams...".

> The questions:

> - did I miss a way to sleep on write when kernel is out of buffer
> memory?

Not AFAIK, FWTMBW.

> - if this is impossible, what about adding a socket option for that?

Good idea, I'd say.  (Not necessarily easy; I haven't looked at the
code.  But good.)

> - would it make sense to have such an option be the default for
> SOCK_STREAM and SOCK_SEQPACKET?

I think it is, for SOCK_STREAM.  I've certainly not run into a case
where it errors rather than sleeping when sleeping could alleviate a
resource shortage.

For SOCK_SEQPACKET, yes, I think it should be the default.

Might even make sense to implement it for AF_LOCAL SOCK_DGRAM.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Proposal: killpeer(2)

2011-05-15 Thread der Mouse
> Putting that aside for a moment, let's pretend we had a stream socket
> in this application instead of a datagram one.  I think the right
> thing to do at that point would be to add a killpeer() syscall, which
> sends a signal to the other process associated with a file
> descriptor.

With what permissions checking?

If the answer is "same as kill(2)", then you may find it crippled
unless daemons run as root, which of course is undesirable.

If the answer is "none", then suddenly processes using vanilla AF_LOCAL
sockets can get signals from peers that formerly couldn't send them
signals, thus creating DoS potential against system processes.

I'm not sure what other options exist.  Perhaps it could work only if
the other side has agreed to receive signals, with a distinctive error
if not?  (Trying to send signal 0 could probe for this.)

Another thought: what if there is other than exactly one process on the
other end?  Even for SOCK_STREAM, this is possible; multiple processes
can have file descriptors referring to a single open file table entry,
by means such as fork() or SCM_RIGHTS messages.  There may even be no
such process, if the peer socket has been closed, or if all references
to it are sitting in SCM_RIGHTS buffers in other sockets.  (Process X
has fd, sends it as SCM_RIGHTS to process Y, but X closes the fd before
Y receives the message - who is "the peer process"?  What if there are
multiple Ys who have access to the socket and could receive the
SCM_RIGHTS-bearing message?)

> This is reasonably general -- it would be useful for ptys and pipes
> as well, I think.  I think it would be a very useful tool for cleanup
> of potentially misbehaving peers in many local-IPC client/server
> applications.

I'm concerned about possible abuses of the capability.  I don't think
signals are an appropriate response to "misbehaving peers"; a more
appropriate response is generally to close the communications channel
to the misbehaving peer.

Even in the case that brought this up, I would say the right answer is
not to provide a way to kill off the process, but rather redesign the
communications mechanisms so there is no need to kill off the process.
While I don't know fuse beyond what manu@ said on Saturday, it seems to
me that the right answer is to provide something perfused can do to
ensure that it doesn't matter whether glusterfs sticks around, to break
whatever connection exists that's allowing glusterfs's persistence to
interfere with the next mount.  (Or, possibly, not something perfused
needs to do; it's armchair quarterbacking to an extent, but it seems to
me that this should be part of the unmount operation.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: statvfs() sleeps forever on tstile

2011-05-08 Thread der Mouse
>> What is the difference between UFS1 and UFS2, and what is the
>> difference between FFS v1 and FFS v2?
> As I understand, FFS sits on top of UFS.

That could be part of my confusion, then.  Speaking strictly from a
personal-experience historical perspective, FFS was the Berkeley "Fast
File System", added no later than 4.2 (can't recall whether it was in
4.1c) and UFS was Sun's name for FFS when they imported it to SunOS
(this was long before Solaris).  I've never totally understood the
distinction between the two as it exists in NetBSD.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: statvfs() sleeps forever on tstile

2011-05-08 Thread der Mouse
>> ffsv1 or v2?  [...]
> I suspect this is not about UFS1 vs UFS2 but FFS1 (as obtained by
> newfs without -O) vs FFS v2 (as obtained by newffs -O 2).

Okay, now I'm confused.

What is the difference between UFS1 and UFS2, and what is the
difference between FFS v1 and FFS v2?  I've been thnking they were the
same thing described using different terminology, but that's
inconsistent with what you say here (and a few other indications I've
seen, none of them strong enough to really bring the distinction into
sharp relief the way this does).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread der Mouse
> The problem is that there might be some ports whose MAXPARTITIONS is
> still 8 and such ports can't use type 8.

Nothing says fd.c has to use MAXPARTITIONS (nor the macros built using
it) when breaking up the device minor number.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RAIDframe component replacement

2011-04-08 Thread der Mouse
>> [...]
> So, do you suggest replacing sd0 without scsictl detach/scan?

Yes.

> While this could work in my special case where the replacement disc
> is the exact same model as the phased-out one and already has a
> disklabel on it, I guess the kernel will get confused otherwise?

In my experience, no.  While most of my SCSI work has been done with
older NetBSD, my experience has been that a completely closed drive has
all knowledge beyond its existence flushed; on first open, the label is
reloaded.  I'm not sure whether the size and geometry are reprobed, but
I think so.  (I conjecture this was done to make dealing with
removable-media drives easier.)

Don't depend on 4.0.1 to behave that way, though, without testing it.
Even if I'm remembering right, the SCSI subsystem may well have been
improved enough to cause a regression in that respect.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RAIDframe component replacement

2011-04-08 Thread der Mouse
>> then, after replacing the failed sd0 with the new sd0

> But with scsictl detach/scan, I suppose?

If that's an excerpt from me: no.

> I've done this several times with non-hotpluggable SCSI hardware
> where I had to power off anyway.

I've done it with non-hot-pluggable hardware _without_ powering off.
If the SCSI bus is in use by any other devices, I will usually break to
ddb (to ensure the bus is idle) during the unplug-and-replug.

> But with SCA, I'm unsure, whether, after detaching sd0 (and sd1 still
> there), a newly scanned sd will become sd0 or sd2?

I'm not sure either.  My guess would be that it would be sd2, but that
is just a guess.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RAIDframe component replacement

2011-04-07 Thread der Mouse
>> None of these can help with that.  Whenever as your RAID1 is running
>> single-member, you lose it if the live member fails.

> Yes, of course.

> But it's currently running 2-member.  Is there a way to temporarily
> run it three-member and then remove the error-prone component?

Oh, I see what you mean!

I don't think there is.  As far as I can tell, RAIDframe (at least as
of 4.0.1, which IIRC is what you said you're using) does not support
RAID 1 with more than two members.  (FIxing that would be my preferred
fix, actually; even just making it two-or-three members would help, as
then any number of members could be done with a fixed-depth tree.  But
I would guess - not having looked - that going to three would be little
if any easier than going to an arbitrary number.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RAIDframe component replacement

2011-04-07 Thread der Mouse
[Please don't use paragraph-length lines for normal text!]

> I have a RAID1 consisting of sd0a and sd1a.  Now, sd0 sometimes fails
> with "hardware error", but reconstruction onto it is OK.  Of course,
> I want to replace the disc.  Luckily, I have a spare drive and
> everything is hotpluggable SCA and I have unused slots.

> It seem I have two options (given the spare disc I have has already
> been fdisk'ed and disklabel'ed):

> 1. Leave the two current discs in, insert the replacement disc,
> scscictl scan it (becoming sd2) and then add it as a hot spare via
> raidctl -a sd2a, Then, raidctl -F sd0a which should begin a
> reconstruction on sd2a.

> 2. Do a raidctl -f sd0a (if sd0 hasn't been marked as failed
> already), then scsictl detach it and pull it out.  Then, substitute
> it with the replacement disc, scsictl scan (does it become sd0 then?)
> and raidctl -a sd0a.  Probably I have to raidctl -F component0 again
> in order for the reconstruction to begin.

Actually, as I think someone else pointed out, there's a third option:

3. raidctl -f /dev/sd0a raid0 (if it isn't already failed), pull the
drive, put in the replacement, and raidctl -R /dev/sd0a raid0.  In my
experience, this should work.  The hazards in hot-replacing a SCSI disk
are electrical, which is not an issue if your hardware is designed for
hot-plug, and data, which is not an issue provided sd0 is completely
closed before removal and not opened until the replacement is ready.
Provided it's not used for anything but that RAIDframe member, this
should be the case.

> Additionally, I would prefer the procedure that is safer against the
> remaining component (sd1) failing in the middle of it.

None of these can help with that.  Whenever as your RAID1 is running
single-member, you lose it if the live member fails.

I've done RAID 11 (to coin a phrase), RAID1 atop RAID1.  In my case it
was three-member, not four-member, and this caused some trouble with
autconfiguration; I've been thinking about possible ways to deal with
that, but haven't implemented anything yet.  It did mean, though, that
we could survive two failures without data damage, not just one.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: GSoC proposal: NetBSD Kernel Documentation Management Toolchain

2011-04-06 Thread der Mouse
>> [...] doxygen [...]
> As regards doxygen:  ow.

Quite.

> Have you ever actually tried to run a nontrivial part of the NetBSD
> tree through doxygen?

No.

I have, however, tried to use doxygen-generated "documentation" for
various other projects (this from the days when I was hacking on some
Android code).

The best I ever saw it get was mediocre.  At the bad end, I had to go
look at the code to figure out what it was even _trying_ to say.

> It does not actually understand C well enough to deal with very
> common things in our tree such as the queue.h macros.

Nor plenty of other things.  I don't recall details, but I recall there
was something about on the order of thinking there was a struct member
named "int".

An officemate of mine (far more of a "Linux and all things GNUish" fan
than I) pointing out that I shouldn't blame the tool for broken uses of
it - but when you see enough broken uses and no good uses, it's hard to
avoid seeing the tool as, at the very least, to be censured for
encouraging brokenness.  While it wouldn't, per se, be a bad thing to
consider doxygen-generated documentation, it quite possibly would be if
it encourages sloppiness such as I saw.  "Oh, I don't have to bother
writing good comments, doxygen will deal with generating documentation"
- I can't know whether that was actually the mindset, but it would have
been consistent with what I saw.  And I think introducing something
that encourages - or brings with it - anything like that mindset would
be a disaster in the making.  (Not that NetBSD doesn't have plenty of
those, but that's no reason to add one more.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: kernel bitreverse function

2011-04-03 Thread der Mouse
>   ((b * 0x80200802ull) & 0x0884422110ull) * 0x0101010101ull >> 32

>   ((b * 0x0802u & 0x22110u) | (b * 0x8020u & 0x88440u)) * 0x10101u >> 16

Maybe.  Do all machines NetBSD still cares about have fast-enough
multiply instructions?  Unless the multiply is unusually fast compared
to shift and mask instructions, I'd say it would likely have a hard
time beating

b = ((b & 0x0f) << 4) | ((b >> 4) & 0x0f);
b = ((b & 0x33) << 2) | ((b >> 2) & 0x33);
b = ((b & 0x55) << 1) | ((b >> 1) & 0x55);

> It is also worth allowing for cpus that can have a hardware
> instruction (and then do it in 1 clock!)

Yes, I'd say this definitely should support MD implementations.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: high sys time, very very slow builds on new 24-core system

2011-03-23 Thread der Mouse
> I have a new machine with 24 2Ghz Opteron cores.  It has 32GB of RAM.

> Building with sources on a fast SSD ("preloaded" into the page cache
> before the build using tar > /dev/null) and obj, dest, and rel dirs
> on tmpfs, system builds are extraordinarily slow.

What are /tmp and /var/tmp on?

> The system takes about 20 minutes to build a netbsd-5 based source
> tree with -j24 -- about the same amount of time as an older 8-core
> Intel based system running netbsd-5 requires with -j8.

> All cores spend well over 50% time in 'sys', even when all or almost
> all are running cc1 processes.

> Does anyone have any idea what might be wrong here?

You're trying to use too recent a NetBSD? :-/

More seriously, this smells to me as though something is being
serialized, causing most cores to spend far too much time spinning
waiting for that something.  Does current still giantlock anything?  Is
your SSD reasonably performant?  (If compiler temporaries are going
onto disk, contention for the disk might be causing the sort of
serialization you see here - I think there are ordering constraints
that effectively serialize many of the steps involved when lots of
things are creating and deleting files in the same directory.  Hence
the question about /tmp and /var/tmp/.)

Maybe you need to instrument something to find out what's eating so
much system time?  Has kernel profiling support bitrotted?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: CMSG_SPACE: too clever by half?

2011-02-24 Thread der Mouse
[tls]
> The issue involves the way we arranged for binary compatibility
> across changes in the unix-domain file descriptor and credentials
> passing code.  [...]

> I would appreciate others' opinions on this.

Here's scm-rights.h from one of my SCM_RIGHTS-using programs.

I think the comments say it all.  They were written based on, IIRC, the
3.1 interface, but based on this thread it sounds as though it's only
gotten worse since then.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

#ifndef _SCM_RIGHTS_H_e8240238_
#define _SCM_RIGHTS_H_e8240238_

/* This file is in the public domain. */

/*
 * This file exists because the SCM_RIGHTS interface silently changed.
 *  It used to work the traditional way: pack file descriptors tightly
 *  as an array of ints immediately after the struct cmsghdr.  It was
 *  changed to use the layout corresponding to the CMSG_* macros, which
 *  don't exist historically; since the resulting layout is different
 *  on architectures for which __cmsg_alignbytes is greater than
 *  sizeof(int)-1, this is rather broken, breaking historically working
 *  code.  (The CMSG_* macros were invented for all the control guff
 *  IPv6 wants to shovel around.  What I don't understand is why they
 *  were imposed on SCM_RIGHTS messages.)  To make things worse, the
 *  CMSG_* macros are a broken API: they don't even try to support
 *  control messages that aren't in buffers aligned suitably for a
 *  struct cmsghdr (in particular, there is no way to find out where
 *  the data for a message falls relative to the message's beginning
 *  except by computing the data pointer as a function of the cmsghdr
 *  pointer).  This means you have to either use gcc extensions like
 *  __alignof__ or you have to do something like the CMSKIP macro
 *  below - unless you're willing to malloc the buffer (and, strictly,
 *  even that is not enough, since there is no guarantee that the
 *  required alignment is that of any object type).
 *
 * So we actually use macros CMSPACE, CMLEN, and CMSKIP, which are
 *  defined either suitably for the historical way (if
 *  NEW_CMSG_INTERFACE is not defined) or the least ugly way I've found
 *  for the CMSG_* way (if NEW_CMSG_INTERFACE is defined).
 */

#ifdef NEW_CMSG_INTERFACE
#define CMSPACE(x) (CMSG_SPACE((x)))
#define CMLEN(x) (CMSG_LEN((x)))
/*
 * It's gross to have to do this, but it's more or less forced upon us
 *  by the botched design of the the CMSG_* interface.  The interface
 *  takes first steps towards a completely opaque interface, but
 *  botches it rather badly, resulting in neither an interface that can
 *  be used opaquely nor an interface that can be used transparently.
 *  Since the traditional interface is the transparent style, and the
 *  opaque style cannot be done without alignment issues (see below),
 *  this code goes in the transparent direction.  The result is not as
 *  portable as I'd like - it can depend on using a pointer past the
 *  end of an object, depending on the architecture - but I believe
 *  it's the least horrible of the available alternatives.
 *
 * The interface seems designed to overlay the structs cmsghdr onto the
 *  control buffer, but that demands the buffer be aligned, without
 *  providing any way to actually achieve that, which more or less
 *  compels its allocation with malloc(), that being the only portable
 *  way to correctly align a buffer whose alignment requirements are
 *  inacessible.  (Using gcc extensions like __aligned__ and
 *  __alignof__, this can be worked around, but (a) that's gcc-specific
 *  and rather ugly, (b) it shouldn't be necessary, and (c) it can't be
 *  done without making assumptions for which there is no basis except
 *  knowledge of the implementation, like "the alignment necessary is
 *  the most strict of struct cmsghdr and the types to be stored in the
 *  buffer".
 */
#define CMSKIP(x) ((char *)CMSG_DATA((x))-(char *)(x))
#else
#define CMSPACE(x) (sizeof(struct cmsghdr)+(x))
#define CMLEN(x) (sizeof(struct cmsghdr)+(x))
#define CMSKIP(x) (sizeof(struct cmsghdr))
#endif

#endif


Re: partitionSizeHi in raidframe component label

2011-02-13 Thread der Mouse
>> do we need it on HEAD, too?
> Yes.  The patch applies cleanly to the very recent -current.

A suggestion from someone who's had to deal with such changes before:
change the struct element name (and change the macros to match), as a
way to guarantee catching any lingering direct references.

It's possible you've already done this privately.  But I'd suggest at
least considering doing it in the main tree.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: remove sparse check in vnd

2011-02-06 Thread der Mouse
>> Of course, still better would be to fix vnd, though I'm not sure
>> what the right fix would be.
> What's the problem?

The problem I am familiar with with vnd and sparse files - which
admittedly may not be the same as the one others have been talking
about - is that using a sparse file to back a vnd produces errors when
attempting to access the vnd blocks corresponding to holes in the
backing file.

I don't totally understand the problem, just the symptom.  I had a look
once and remember something about VOP_BMAP, but even that memory is
pretty fuzzy.

This is why NFS-remote sparse files don't provoke it: their sparseness
is completely hidden behind the NFS protocol.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: remove sparse check in vnd

2011-02-05 Thread der Mouse
> I think that the sparseness check should stay until vnd supports
> sparse files.

It already does, under some circumstances (eg, NFS-remote).

I _would_ prefer to see an override for cases like NFS, or where the
values returned trip the test even though the file is not sparse in the
sense vnd cares about.

Of course, still better would be to fix vnd, though I'm not sure what
the right fix would be.  (Hmm, read-only, synthesize a block of 0s;
read-write, write a block of 0s first?  That's the cheap and simple
fix, though ISTM ideally the write should be delayed until the block is
written in the read-write case.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Is there a way to obtain a machine's cache line size?

2011-01-21 Thread der Mouse
> m68010 doesn't have a real cache, but the very small instruction
> fetch fifo is blocked when a djnz is executed, so that a very short
> loop (1 other instruction) can be executed without repeated
> instruction fetches.

Reminds me of the KA630 (one of the MicroVAX-II CPU boards).  I was
building an emulator for it and it crashed in the ROM code.  Turns out
there's an instruction prefetch buffer.  Most ways of doing control
transfers flush it when appropriate.  But turning on the MMU does _not_
flush it; the ROM code turns on the MMU and then executes a handful of
instructions out of the prefetch buffer, the last one being a jump to
the same code at its now-MMU-mapped address.

The VAX OS sources I have make sure that the MMU is set up so the same
addresses work for physical and virtual before turning the MMU on,
perhaps because different VAXen differ in this regard.  But ROM code
can assume it's running on the hardware it's for.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Is there a way to obtain a machine's cache line size?

2011-01-21 Thread der Mouse
> 3. Dynamically allocating data to force its layout on a particular
> memory boundary will require dereferencing a pointer each time you
> need to access that data.  You will have better performance if you
> add padding to your data structure, so that layout is determined
> during compile time.

(a) Sometimes the data structure is dynamically allocated, or at least
accessed via a pointer, anyway.

(b) Depending on how heavy the cost of going to memory is, the cost of
the pointer indirection may be effectively zero, because it gets lost
in the stalls waiting for memory anyway.

(c) The cost of the pointer indirection, even when not swallowed up in
other effects, may be less than the cost of bad cache behaviour from
badly-aligned data structures.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Is there a way to obtain a machine's cache line size?

2011-01-20 Thread der Mouse
>>> [cache line sizes, maybe not always 64?]
>> You are correct; to cite the one example I currently have swapped
>> into my brain, the Super-H used in the Dreamcast has 32-byte cache
>> lines (true of the I-cache and D-cache both).
> I'm curious why non-kernel components would care.

Choosing array layouts for efficient access?

Choosing a stride for prefetch operations?

> The question also gets amusing when the cache line size varies among
> the caches.  That's not all that common, but it certainly happens.

Indeed!

I wonder if there are caches with different cache line sizes for the
same type of fetch (instructions vs data).  One for opcode fetches and
one for inline constants maybe?  (On the Super-H, this could actually
make sense; "inline" constants are not truly inline - they may be
displaced significantly from the instructions accessing them.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Is there a way to obtain a machine's cache line size?

2011-01-20 Thread der Mouse
> I see there is a compile time constant CACHE_LINE_SIZE in
>  which currently seems to be always be set to 64, but
> I'm pretty certain that is not necessarily a correct value.

You are correct; to cite the one example I currently have swapped into
my brain, the Super-H used in the Dreamcast has 32-byte cache lines
(true of the I-cache and D-cache both).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Dates in boot loaders on !x86

2011-01-18 Thread der Mouse
>> Except that tells me whether the kernel being booted is recent, not
>> whether the bootloader doing the booting is.
> No.  It is the version in src/sys/sys/param.h. It doesn't have any
> relation to the kernel you are running or booting.

Oh, that kernel version!  That it is of no use for my purposes.

> The point is that it is a non-changing, human readable identifier of
> the source tree that hopefully changes often enough to be able to
> tell two versions apart.

I care about bootloader timestamps when I'm hacking bootloader code and
want to be able to tell the difference between still running the
previous booter or the one I built just a few minutes ago - even if
"the previous booter" is the one I built thirty minutes ago in the same
bootloader-hacking run.  Being able to tell the 6.0 bootloader from the
5.2 bootloader, while perhaps important, is not what I'm talking about
here.

I don't do this often, but when I do there's not much substitute.

>> However, based on the discussion, it sounds as though this is not an
>> issue: [...]
> That depends.  Some platforms dropped them completely.

That sucks.  Well, if portmasters don't mind screwing over people
trying to actually hack on their ports' code, I guess it's their call.

> It is much simpler to consistently drop it.

Simpler?  Certainly.  My point is, it is a regression, a signficant one
for me at least.  If you don't mind crippling people trying to work on
the bootloader, be my guest.  (If I had occasion to work on such a
bootloader, one of the first things I'd do would be to add something
functionally equivalent back, even if just a
manually-changed-when-I-care "hi, I'm not the previous version"
printf.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: xorg pci probing

2011-01-18 Thread der Mouse
>>> pci_device_is_boot_vga()
>> [...]
> While X is probing the pci devices the driver enables all pci vga
> devices.  That is why checking for the 'firmware enabled' one fails.

But checking for "the" firmware-enabled one is a broken idea.  There
may fewer or more than one such, for one thing - nothing says the
firmware has to initialize any displays; nothing says it has to
initialize no more than one.

And it's a check X shouldn't be doing anyway.  Doing bus enumeration in
userland is insane; that's what we've got kernels for.  X should be
using the device - or devices - it's told to use; if not told anything
about what device to use, it should be picking a sensible default, like
the console (not necessarily boot) device, without caring about others.
I can, sort of, see doing direct access to the display as a PCI device
(_of course_ every display will be a PCI device!), but in no
circumstance can I see any excuse for poking at devices it isn't going
to use or walking all of any bus.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Dates in boot loaders on !x86

2011-01-17 Thread der Mouse
>> True.  What I use the timestamp for (when I use it at all, that is)
>> is answering "is the bootloader (or whatever) I'm getting the one I
>> just built and think I installed, or did something go wrong?".  The
>> difference between ten minutes old and two weeks old is important;
>> the difference between two weeks old and six months old is not.
> BTW, the kernel version is still included, so at least on -current
> you can normally detect the case of "pretty recent" and "a few month
> old" from that as well.

Except that tells me whether the kernel being booted is recent, not
whether the bootloader doing the booting is.

Unless you mean it includes the version string of the kernel it was
built under, in which case it can't allow me to tell the difference
between multiple builds and installs all performed under the same
kernel (which is probably what I'd be doing when hacking booters).
(That also looks as though it would make bit-identical repeatable
builds under different kernels difficult, though it's not clear to me
whether that matters.)

However, based on the discussion, it sounds as though this is not an
issue: it appears the datestamps are still there unless turned off (as
anyone wanting bit-for-bit-repeatable builds presumably will), meaning
the issue I raised simply does not exist.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Dates in boot loaders on !x86

2011-01-16 Thread der Mouse
> Just because you build any branch, including current, 3 month ago,
> doesn't say much about the sources.

True.  What I use the timestamp for (when I use it at all, that is) is
answering "is the bootloader (or whatever) I'm getting the one I just
built and think I installed, or did something go wrong?".  The
difference between ten minutes old and two weeks old is important; the
difference between two weeks old and six months old is not.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Dates in boot loaders on !x86

2011-01-16 Thread der Mouse
>>> [...], but I wonder if there is any point in keeping the date [in
>>> the bootloaders]
>> Personally, I find it useful to make sure I've got the bootloader I
>> think I do installed, usually after a change-and-rebuild.  "Wait a
>> minute, this one was built two months ago, not ten minutes ago; what
>> went wrong?".
> That's what the version number is supposed to tell you.

Any version number that could tell me what I use the datestamp for
would be functionally equivalent to a datestamp, except less easy to
verify.

If it were up to me, I'd go with Martin Husemann's idea.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Dates in boot loaders on !x86

2011-01-16 Thread der Mouse
> [...], but I wonder if there is any point in keeping the date [in the
> bootloaders]

Personally, I find it useful to make sure I've got the bootloader I
think I do installed, usually after a change-and-rebuild.  "Wait a
minute, this one was built two months ago, not ten minutes ago; what
went wrong?".

I don't often care about this, but when I do, there's not much
substitute I've found.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: TLB tiredown by ASID bump

2011-01-06 Thread der Mouse
> No.  The existance of ASIDs along with the hardware implementation is
> fundamentally a proprty of the MMU design.  Exposing this information
> outside of the MD code base (pmap(9)) breaks encapsulation.

In detail, yes.  In general, no.

I haven't looked at the ASID issue in detail.  But it sounds to me as
though it needs at least a few things elsewhere, though they can (and
probably should) be kept as general as feasible.  For example, every
process (or maybe lwp) needs to have an ASID hanging off it - but it
doesn't have to be done that way; I'd say it should be done by giving
each process (lwp) a pointer, or maybe a small block of data, which is
totally private to the pmap in use.  If pmap wants a call at process
exit, or syscall exit, or whatever, fine - but do it as a
general-purpose hook which the pmap in use can use to do whatever it
wants, not just ASID fiddling.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: ioctl(2) vs sys/ioctl.h

2010-12-19 Thread der Mouse
> There is a bigger problem, the 'int' and 'void *' arguments might be
> passed in different ways then '...' is specified.

True, but it is not inherently a problem; it just complicates the
implementation of ioctl(), since it then has to not just pass down a
data pointer, but pass down enough information for the particular
ioctl's implementation to find whatever type the actual argument is.
(As it is, the implementation already depends on a nonportability,
basically that all pointer types are "the same".  It would explode
badly on a machine where some but not all pointer types are larger than
a machine word/register.)

> We only get away with it on our 64 bit archs because they all pass
> the first 3 arguments in registers.

...and are all byte-addressed.  If some pointers were 64 bits and
others were 128 (or, worse, 72 or 96 or some such), it would fall over
rather hard.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: ioctl(2) vs sys/ioctl.h

2010-12-18 Thread der Mouse
>>  int ioctl(int, unsigned long, ...);

> Most of our ioctl's take pointer arguments.  Some streams ioctls
> though take int arguments (ioctl(fd, I_FLUSH, FLUSHR) for example)
> and using void * as the argument would not compile cleanly.

Must FLUSHR (to continue your example) be defined as an int value?

Obviously there can be ABI issues, but they can be worked around the
same way you work around other compat ABI issues - or ignored, on
arches where ints and void *s are passed sufficiently compatibly.

Or perhaps ioctl could turn into something else after #including the
file that defines I_FLUSH and/or FLUSHR?

I'm just brainstorming possible ways to avoid inflicting a varargs
declaration on all users of ioctl.  I don't know whether there are any
issues which might break the above ideas - assuming anyone besides me
cares, that is.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: radix tree implementation for quota ?

2010-11-29 Thread der Mouse
> The prime number hash always looks good for most simple sequences.
> It looks extremely bad for other choices like {p, 2p, 3p, 4p, ...}.

Indeed.

> But for semi-random input, the simple prime hash can be a very bad
> hash function.

What's "semi-random" here?  For random input, that is, input
uncorrelated with anything else, all hash functions are equally good,
so the simplest/fastest is best.  But input is almost never truly
random.  (In this case, for example, it is heavily weighted towards
numbers near zero.)

Without any real data on what UID distribution looks like in practice,
we're all speculating in a vacuum here.

> If the keys are controlled by a third party, it is very easy to
> degrade the performance to a linear list.

Sure, but that's not a useful remark.  It's equally true of (n*K)>>32,
or for that matter any other easily invertible hash function.  If the
bucket count is small enough to make guessing feasible for the
attacker, it's true of any hash function cheap enough to be useful as a
hash function.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: radix tree implementation for quota ?

2010-11-29 Thread der Mouse
> [...] but as I understand it, the [NFS] server implements its own
> limits, not the client, which is as it should be.

The server should enforce limits if it is configured to have them, yes.
But the only reason I can see why the client shouldn't also be able to
put quotas on an NFS-mounted filesystem is implementation laziness.
(Justfied laziness, perhaps; "laziness" carries a somewhat pejorative
implication that may be not entirely deserved here, but I'm having
trouble coming up with a better word.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: radix tree implementation for quota ?

2010-11-28 Thread der Mouse
> If you have issues with a power of 2 hash table size, you do not have
> a good hash function in first place.

Almost by definition.

> With a good hash function, using a non power of 2 size just tends to
> be slower.

Indeed - other things being equal.  Which they often aren't.

I think the point of a prime table size is that with a hash key that's
an integral type, as here, then you can often use a prime table size
and get decent reults from the identity hash function - or, to think of
it another way, use value%size as your hash function and use the
identity mapping from hash values to table buckets.  A prime hash table
size gives about as good hashing under this sort of scheme as anything,
for most key distributions.

Yes, reduction modulo a prime is generally a substantially slower
operation than masking off all but the last N bits.  However, if it's
faster than the hash function you were going to use before doing the
masking, which is not infrequently the case, it's an overall win.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: misuse of pathnames in rump (and portalfs?)

2010-11-24 Thread der Mouse
> Right. But if you want a guaranteed absolute path you should be able
> to do it by calling getcwd first.

Only if you accept breakage if the current directory no longer has any
name.

Of course, if you consider that acceptable, then fine.  I don't, not
for something as central as namei (though this looks as though you may
be talking about only certain filesystems, in which case it may be
acceptable).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: misuse of pathnames in rump (and portalfs?)

2010-11-24 Thread der Mouse
>>> (Note that this is free -- it would require splicing a getcwd into
>>> every namei call.)
>> _Not_ free, I assume you mean?
> er right, silly editor... :-/

:-þ  I've had that happen to me often enough.

> Anyway, it looks as if it's not needed.

Just as well.  It has (since) occurred to me that it won't even work,
unless you're willing to accept a regression in that it will totally
break namei when the current directory no longer has any name.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: misuse of pathnames in rump (and portalfs?)

2010-11-24 Thread der Mouse
> However, I discovered today that rumpfs's VOP_LOOKUP implementation
> relies on being able to access not just the name to be looked up, but
> also the rest of the pathname namei is working on, specifically
> including the parts that have already been translated.

"Eww."  The rest of the path to the right, that makes some sense
(though I'd prefer the filesystem handle it using vnodes for the
intermediate directories).  The rest of the path to the left, that
makes no sense.  Not to me.

> When I asked pooka for clarification, I got back an assertion that
> portalfs depends on this behavior so I should rethink the namei
> design to support it.

If so, I believe portalfs is critically broken and should be pulled
until it's fixed.

> (1) does anyone think that a correct namei design should provide the
> namei pathname scratch space to the FS for its inspection during
> lookup?

Not me.

> (2) does anyone think that a correct namei design should provide a
> correct canonicalized full pathname for its inspection during lookup?

Not me.

> (Note that this is free -- it would require splicing a getcwd into
> every namei call.)

_Not_ free, I assume you mean?

> (3) Does anyone object if, as a way forward, I add an extra argument
> const char *partialpath to VOP_LOOKUP to provide the string that rump
> wants, until rump is fixed, and then revert it?

Well...I don't like it; such "temporary" hacks have a way of becoming
rather less temporary than they should.  Personally I think the right
fix is to fix the interface and, if this breaks rump, and let it stay
broken until someone fixes it to not abuse the interface.

But I'm hardly the arbiter of such things.

> (4) vermilion.

Dead jackal brown.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: CVS commit: src/sys/arch/powerpc/oea

2010-11-15 Thread der Mouse
>> I've long felt this way: that, except for a very few examples like
>>  that are defined to depend on context, the order of
>> #includes should not matter.  In particular, if multiple files must
>> be included, any of them may come first - so any file that generates
>> errors if it's included first needs fixing.
> assert.h is the *only* header that is not supposed to be idempotent.
> Pretty much anything else should be classified as bug.

I'm not talking about idempotency.  Idempotency is the question of
whether

#include 
#include 

is operationally equivalent to

#include 

But what I'm talking about is whether

#include 
#include 

and

#include 
#include 

are operationally equivalent.

> Another item is that too many of our headers depend on non-standard
> compliant types polluting the namespace.

This is another issue I have with our include files, but it's
significantly more work to fix, so I've been working around it rather
than fixing it right.  (It's also not entirely clear what the right fix
is when more than two levels of software supplier are involved.)

> Nothing installed in /usr/include should depend on u_char for
> example.

Well...ideally.  But there's a big difference between (say) 
depending on u_char and  depending on u_char.
(I'm not happy about either, but substantially less happy about the
former.)

>> struct foo;
>> typedef struct foo FOO;

> Problem is that this requires a guard for the typedef if FOO is
> supposed to be defined by multiple files.

True.

Most of the cases I've seen for this push the incomplete struct
declaration and the typedef into a separate file, wrap it with an
idempotency guard, and then include that from files that need the
struct and/or type.  I've seen a very few cases where this hasn't been
done and I think none at all where it couldn't be done if the relevant
software authors cared to bother.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: CVS commit: src/sys/arch/powerpc/oea

2010-11-15 Thread der Mouse
>>> Every header file should include the things it requires to compile.

I've long felt this way: that, except for a very few examples like
 that are defined to depend on context, the order of
#includes should not matter.  In particular, if multiple files must be
included, any of them may come first - so any file that generates
errors if it's included first needs fixing.  (Well, unless it's an
internal file, one that shouldn't be included directly.)

I've got numerous fixes to 4.0.1 for such issues, in case anyone thinks
it's worth applying this stance to 4.x.

> [...] just forward declarations of the structs.

> (this is, btw, one of the reasons to avoid silly typedefs)

I'm not sure what typedefs have to do with it.  typedeffing a name to
an incomplete ("forward") struct type works just fine:

struct foo;
typedef struct foo FOO;

(You can't do anything with a FOO without completing the struct type,
but you can work with pointers to them)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-14 Thread der Mouse
Just a note to avoid having incorrect information in the archives
without a correction.  I wrote:

> [%] It occurs to me, the VAX's BBSSI and BBCCI _are_ CAS, just
> restricted to a one-bit-wide operand (and with the data-to-swap-in
> specified by choice of instruction rather than an operand).

This is false.  CAS, as the term is normally used, turns out to be not
the compare-and-swap the acronym expands to but
compare-and-conditionally-swap.  However, BBSSI and BBCCI always
perform the write.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-13 Thread der Mouse
> Already wrote about reasoning in other reply to mouse (keywords: no
> code duplication, better design), so wont repeat myself.

Except you appear to think that the only options are the current state
and each arch totally doing its own thing.

There are places other than those two to draw the dividing line.
Indeed, there are semi-MI implementations possible, such as the one I
sketched in a message I sent just minutes ago, allowing arches that can
to share code while not imposing an inappropriate paradigm on those
that don't fit the mold.

Somewhat like the way the various 68k ports share nontrivial amounts of
code through sys/arch/x68k (and similar things in userland) and
similarly for MIPS and SH3 and possibly others I haven't noticed.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-13 Thread der Mouse
> Hadn't it been much nicer of we just had the mutex and lock
> abstraction, and left the whole implementation to each MD part?

Exactly.  Consider this hypothetical:

x86 does #define ATOMIC_OPS_USE_CAS and defines a CAS(); MI code
notices this and defines all the higher-level primitives (if that's not
too much of an oxymoron) in terms of CAS().

ppc, arm, all the arches sufficiently "modern" to have CAS, likewise.

Arches without a sufficiently general CAS[%] do not define
ATOMIC_OPS_USE_CAS and provides their own implementations of mutexes,
spinlocks, whatever.

That seems to me like a mostly sane way to do it.  If I can come up
with it in thirty seconds, it seems likely anyone capable of doing such
overhauls could come up with it.

Instead, all arches must implement a fully-general CAS.  Seems to me
like a lose.

[%] It occurs to me, the VAX's BBSSI and BBCCI _are_ CAS, just
restricted to a one-bit-wide operand (and with the data-to-swap-in
specified by choice of instruction rather than an operand).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-12 Thread der Mouse
> Exactly!  And I would like to emphasize that this has nothing to do
> with breaking of MI and MD abstraction or x86-centric view.

Yes, I imagine you would.  But it's still false.

> Decision was to provide CAS abstraction [sic!] as a primitive for MI,
> by the MD land - in a same way, like we have copy(9), fetch(9),
> store(9) or many other means, just in this case MI asks MD to ensure
> atomicity.  It was relevant to make a break-through for better SMP
> support, since it is an essential primitive used for synchronisation.

It is not.  It may be essential to your preferred implementation, but
it is not essential to synchronization.  (Nor pretty much anything
else, actually.  Not even if CAS is the operation you want for some
reason; CAS can be implemented in terms of other primitives.)

The correct MI abstraction here is "sychronization".  (Or "atomic
queue", or "mutext", or whatever.)  CAS, if used, should be a part of
the implementation.  It may even be shared among MD implementations for
which it is appropriate.  But to impose it on arches which don't have
it but which do have perfectly good synchronziation (or whatever)
primitives is to draw the MI/MD break at the wrong place, to push MD
aspects into supposedly-MI code, just as much as building code based on
INSQTI/REMQHI and then requiring everything not supporting atomic DLL
ops to implement those somehow would be.

> What Johnny apparently suggests is to revisit mutex(9) interface,
> which is known to work very well, and optimise it for VAX.  Well, I
> hope we do not design MI code to be focused on VAX.

Why not?  You don't mind designing it in (other) MD ways.

> If we do, then perhaps I picked the wrong project to join.. :)

One of us certainly did.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-12 Thread der Mouse
> It is not about mainstream.  Please tell me one architecture that has
> been created in the last 10 years, supports at least 32bit address
> space, virtual memory and doesn't support either CAS or LL/SC.

What's that got to do with it?

NetBSD used to be about proper separation between MI and MD so that
multiple architectures can be accommodated.  Even if that meant a lot
of hard thinking to find the right line between MI and MD to take
advantage of things like the MP-ready queue instructions bqt pointed
out the VAX has.

This now makes it appear it now is about lazy coding so that the
"mainstream" architecture can be supported and other arches can be
kinda-mostly supported as long as they're close enough to x86_6^Wthe
"mainstream" one.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mutexes, locks and so on...

2010-11-12 Thread der Mouse
>> Over 15 years ago NetBSD had a possibility to take everyone into
>> account [...]
> So what you are arguing is that MI needn't be so much MI anymore, and
> that supporting anything more than mainstream today is more to be
> considered a lucky accident than a desired goal?

Looks to me like pretty much exactly what pooka was saying.

> Oh well!  I guess I should go away now.

And me, and everyone else running anything but x86_64 (and, maybe,
i386; I don't know whether that's sufficiently modern to count).

Compilers that page themselves to death unless given over twice the RAM
a uV2 maxes out at.  Decisions driven by "a megabyte of disk costs
what, $0.8?".  Now this.

bqt, wanna start a fork?  Looks as though NetBSD no longer supports
most of the architectures it used to.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread der Mouse
> The only right way to retrieve P->V translation is to lookup from
> vm_map (== the fault handler).

What about setting up DMA on machines whose DMA uses physical
addresses?  Or does the DMA code get an exception to this rule?

I also suspect debugging may well be a non-ignorable use case, though I
could also be wrong about that.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mlock() issues

2010-10-20 Thread der Mouse
> proc.curproc.rlimit.memorylocked.soft = 697976149

> With all of the above set, for some reason it's not possible to lock
> more than 666MB.

Well, 697976149 bytes is 665.6419+ MB, so it sounds to me as though
it's doing exactly what it should be.

Unless you're a disk manufacturer, in which case 697976149 bytes is
697+ "MB", but I suspect you're locking 666 MB, not 666 "MB".

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: acpivga(4) v. MI display controls

2010-10-17 Thread der Mouse
> This is the other main difference to OF: on ports using OF, it is
> always available.  ACPI on i386 is not (yet).

It's not quite as simple as "ports using OF" and "ports not using OF".

For example, while I have personal experience with only the one unit,
http://www.netbsd.org/ports/sparc/javastation.html#mrcoffe implies
fairly strongly that some JavaStation-1s have OB and others have OF.
And I _think_ some of the older machines supported (at least last I
knew!) by NetBSD/sparc are old enough to have neither, though I'd have
to dig out examples and try them to be sure - I may be confusing Sun-3s
with Sun-4s.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: acpivga(4) v. MI display controls

2010-10-15 Thread der Mouse
>> The task is not trivial.  On modern x86, practically *everything*
>> that attachs has an ACPI counterpart.  In a way we are thinking this
>> backwards: the attachment should perhaps be done via ACPI that has
>> information about the "natural" device tree

ACPI may be the source of the information, but that doesn't mean it has
to be how the autoconf tree is constructed.

Compare and contrast with how NetBSD/sparc uses the OF (or is it OBP?
I'm not sure) device tree to drive autoconf, but doesn't have a device
node corresponding to OF that everything attaches under; it just uses
the OF tree as the source of the data about what exists where.  (Well,
much of it; autoconf doesn't totally mirror OF, eg, in SCSI device
attachment.)

> This should be solved once and for all, for all acpi(4) and for all
> pci(4), isa(4), ... Otherwise we end up with god-awful mess.

Has anyone tried handling it by giving devices multiple parents?  This
would clean up some other things, such as wscons (there arguably ought
to be something which is parented to both the wsdisplay and the wskbd).
I have no idea how much hair it would introduce, though.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
> I wonder if you would need this kind of console even for ddb work.

Well, I want it to work for direct kernel<->user interaction; ddb is an
example of that, but so are pre-single-user things such as userconf and
printing autoconf output.

> If you need console for ddb, things get messy.

Indeed they do.

As I mentioned in another message I sent moments ago, I'm currently
looking at, basically, importing SLIP into the relevant serial-port
driver, so that it becomes not just a tty and console driver, but a
tty, console, and network interface driver.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
> It seems you want to have a console (in the wscons sense)

Well, no; I want a serial console, not a wscons console.

> which is *not* associated with a serial port, and then bind that to a
> logical serial channel on a SLIP instance which is instead bound to a
> serial port.

That would be one way to get the effect I want.

Another way - the one that I'm pursuing now - is to hack on the serial
line driver for the relevant console hardware to, loosely put, import
SLIP into it.  Instead of having two faces, a tty device face to the
kernel and a driver face to the hardware, I'm planning on giving it
three: a tty device and a network interface to the kernel and a driver
face to the hardware.

The userland schemes went out the window when I realized I really
wanted both tty and network device to be usable pre-single-user.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
>> I was looking at improving if_sl.c to support encapsulating "normal"
>> serial data on the SLIP's tty as packets, [...]
> I've encountered that on a serial protocol in my Arduino work.  It
> worked quite well i have to say.  It worked due to a packet starting
> with an unique byte.  [...]

> I don't know SLIP anymore by heart but maybe you could `invent'/use
> an IP# of 127.0.0.1 as a special case (or 0.0.0.0) to carry the
> key/terminal stuff.

Oh, that's not the problem.  This is where the v6 patches have
relevance: carrying v6 involves inventing a way to tag packets with
types; I used one of them for v6, and it would be easy to use another
for serial data.  The difficulty is in the kernel: SLIP works by
switching line disciplines, which makes it difficult to get normal
output processing to happen - and what I want is for that to happen and
then run the result through if_sl.c before sending it to the real
hardware.

In the light of morning, I'm not sure this will get me what I want even
if I make it work.  I need to think about it more

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
>> I have a situation in which it would be useful to run SLIP on a
>> serial console.  [...]
> Given how rare the situation is, maybe it's best to do the
> encapsulation/decapsulation in user mode, and feed into SLIP via a
> pty.  (I also suspect that the speeds are low, but I know you often
> run older machines.)

Possibly.  I've had remarkably bad luck using TIOCCONS.  I'd also
rather not wrap the serial data in IP packets.  The speeds are as high
as the machines involved support, which doesn't say all that much.

Given the difficulty I've had trying to figure out how to implement
this, I think I may be better off coming at it from another angle in
any case.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
>> I have a situation in which it would be useful to run SLIP on a
>> serial console.  Obviously, this won't work very well at present.
>> (The machine has only one serial port and no useful network
>> interfaces.)
> Maybe something like SLIRP (http://en.wikipedia.org/wiki/Slirp)?
> Haven't used it in aeons and never tried it under NetBSD, though.

That is pretty close to sliplogin(8), and it deals with turning a login
tty into a serial IP transport; it does nothing at all about
encapsulating ordinary serial data that would otherwise be sent down
the serial line into something that can be pulled out on the peer.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
>> I was thinking of making this another protocol type, akin to what I
>> mentioned (probably on tech-net) back in '02 -
> 
> Maybe you could get IETF to standardize an extension to *PPP* :-)
> 

While I hesitate to feed self-admitted trolls :), there actually is a
real point lurking here.

(a)

There's a reason I published my documentation of it as a file in my FTP
space rather than trying to get even an I-D created: the IETF's hoops
are insane.  RFCs used to be people floating ideas.  Then they got
formalized, so I-Ds were created to be the informal version.  But now
even I-Ds have accreted too much formality to be useful.

(b)

I specifically do not want PPP.  Hn '02, when I first designed the
extension to PPP, I sunk an afternoon into trying to use PPP.  My use
case then was a hardwired serial line between two ports dedicated to
the purpose; I was utterly unable to create a configuration that would
come back up seamlessly for every order and timing of reboots.  About
the only thing PPP would have brought to that use case, if I had made
it work, would have been detection of a dead peer, and that was of no
value in that case, since there was nothing to do but drop the packets
if the peer was dead, and that's exactly what SLIP does.

Now, it's possible that I just missed something.  But, while no way am
I a true expert at networking, I'm pretty good; if I can't make it work
in a whole afternoon of trying, at the very least it suffers from being
too complicated.

As soon as I switched to SLIP, on the other hand, it worked first time,
every time.  "Too simple to break".  (The reason I didn't start with
SLIP was that I needed to carry v6 packets.  After an afternoon of
struggling with PPP, an hour or two of hacking on the SLIP
implementation and I had packets flowing.)

(c)

The IETF lives on "rough consensus and running code".  I've got the
running code for v6 and thought some people might be interested either
in that (hence the PR) or in chipping in on how to multiplex serial
data into it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


SLIP coexisting with serial data?

2010-10-10 Thread der Mouse
I have a situation in which it would be useful to run SLIP on a serial
console.  Obviously, this won't work very well at present.  (The
machine has only one serial port and no useful network interfaces.)

I was looking at improving if_sl.c to support encapsulating "normal"
serial data on the SLIP's tty as packets, thus merging it into the
packet stream.  I ran into some problems, but think I can handle them;
I'm writing to ask if there's any interest from anyone else in this
sort of thing, in upgrading SLIP to support "normal" serial output.

I was thinking of making this another protocol type, akin to what I
mentioned (probably on tech-net) back in '02 - I just now (finally)
filed kern/43959 containing patches to support v6 as well as v4,
something that's easy compared to making the tty still work as a tty,
but which includes a good deal of multi-protocol scaffolding that's
semi-necessary for the way I've been envisioning doing serial data.

Thoughts?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Capsicum: practical capabilities for UNIX

2010-09-27 Thread der Mouse
> POSIX.1e "capabilities" are actually coarse-grained OS privileges,

Not all that coarse-grained as compared to traditional Unix privileges!

> [POSIX-style "capabilities"] solve (or, in some cases, don't solve)
> an orthogonal problem in UNIX security: how to decompose root
> privilege.

Not all that orthogonal.  I'd say that they're really solving the same
thing, just in different ways and at different levels of granularity,
that "same thing" being the question of how to convert the single level
of privilege offered by the hardware (user mode versus kernel mode, in
Unix terminology) into something more useful.

Traditional Unix breaks this into three: kernel mode, root user mode,
and non-root user mode.  POSIX "capabilities" (which, based on what was
said upthread, are remarkably like VMS privileges) break it down a bit
further.  Capabilities in the sense everyone but POSIX uses the term :)
break it down even further and in a somewhat different way, but it's
still addressing the same basic problem: how to allow/deny access to
resources in a more useful way than the all-or-nothing way the hardware
provides.

> Capabilities in a classic security sense are unforgeable tokens of
> authority that can be delegated.

Sounds a lot like POSIX "capabilities" to me - it's just that the
authority comes in, as compared to non-POSIX "capabilities", relatively
coarse chunks and is passed around in a rather different way.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: where is my memory?

2010-09-22 Thread der Mouse
>>> total memory = 2047 MB
>>> avail memory = 1999 MB
>> total memory = 256 MB
>> avail memory = 239 MB
> Some graphics chips, especially on lower-end machines, use main
> memory, thus making it unavailable to the CPU.

It's not that simple.  I've seen this for a very long time, including
on machines with no graphics chips at all, such as 4.3 on a VAX 750.
Here's a live example; this is quoted from /var/run/dmesg.boot on a
SPARCstation LX that's my desktop at one of my workplaces:

total memory = 48688 KB
avail memory = 41676 KB

That's with a cg6 with its own private framebuffer RAM.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: O_DIRECTORY

2010-09-13 Thread der Mouse
> Is there a way to open a directory on which you have neither read nor
> write access (maybe not even search access) ?

I've thought for some time there sbould be an O_NOACCESS, to open
things without any ability to do I/O.  The major use for this I've
found is opening directories for later feeding to fchdir(), but I
suspect that if it existed people would find other uses for it.

I'm not sure what, if any, restrictions there should be on it, which is
one reason I've never actually tried to implement such a thing.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: scsipi mid-layer, tagged queueing, and ciss(4)

2010-08-29 Thread der Mouse
> It does report the CmdQue bit in the flags3 field of the inquiry, but
> leaves the ANSII value in the version field 0 - which indicates the
> flags3 field isn't present.  The scsipi layer doesn't check the
> flags3 value because the ANSII value is < 2, and thus doesn't set the
> tagged queueing capability for the device.

This sounds to me like just the sort of thing quirks are for.  Is there
some reason not to make this a quirk?  Or Am I Missing Something (tm)?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-24 Thread der Mouse
>>> I believe that non-contiguous netmasks actually are illegal nowadays.
>> Cite?
> RFC 4632 (CIDR Address Strategy), section 5.1:

...which is titled "Rules for Route Advertisement".  (Also, 4632 is a
BCP, not a standard.)

> "  An implementation following these rules should also be generalized,
>so that an arbitrary network number and mask are accepted for all
>routing destinations.  The only outstanding constraint is that the
>mask must be left contiguous."

With respect to route aggregation in advertisements (ie,
exterally-visible behaviour).  See the second paragraph of 5.2.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
> Non-contiguous masks can indeed be useful, albeit only in specialized
> topologies and networks.  I could have used them in a paper I
> published just 1.5 years ago.  The trouble is that they conflicted
> with the routing table definition necessary for CIDR, and CIDR was
> and is necessary for the survival of the Internet.

Hm?  Conflicted how?  The routing table structures in use in NetBSD,
past and present, seem to handle both CIDR and noncontiguous netmasks
just fine - provided, as people point out, you avoid ambiguous cases,
but that's inevitable unless and until you pick a rule to resolve the
ambiguities.  (It doesn't need to be standardized, at least not beyond
the boundary within which the noncontiguous netmasks are confined.)

After all, CIDR masks are just a special case of arbitrary-bitmask
masks; code to handle the latter correctly will necessarily handle the
former.  (Conversion between bitmask masks and CIDR-style lengths is,
or at least can be, just an interface issue.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
> Was [running my house LAN with a noncontiguous netmask], for
> practical purposes, unsupportable?  Was it something likely to cause
> subtle bugs all over the networking stack?  Was it something
> obsoleted more or less 20 years ago?  All yes.

Actually, no.

Unsupportable?  I don't see anything unsupportable about it.  Every
system I tried (which admittedly wasn't all that many) supported it
fine.  Even today, I tried NetBSD 4.0.1 (the most recent I have easy
admin access to) and it appeared to support it as well as whatever I
was using at the time did - though admittedly I didn't actually verify
that packets were routed the way the resulting routing table implied.

Likely to cause bugs?  Nonsense.  Likely to expose existing bugs,
perhaps.  Do you not consider exposing existing bugs a good thing?
I know I certainly do.

Obsoleted 20 years ago?  Perhaps.  Strikes me as pretty functional and
useful for an "obsoleted" feature.  Besides, this _was_ 20 years ago -
well, actually more like 15±5; I didn't have much of a house LAN
before maybe 1991, and I stopped using the address space this was
embedded in sometime around 2000-2001.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
>> IMO anything that pretends to implement IPv4 but which doesn't do
>> noncontiguous netasks is simply broken, I don't care whether it
>> comes from Cisco or Netgear or NetBSD.
> For that to work at all across multiple implementations would require
> a standard to tell you, when your destination address matches more
> than one route, which of those routes takes precedence.

That...disagrees with my experience.

I ran my house LAN with a noncontiguous netmask for years without any
such standard.  Worked just fine.

Perhaps you meant "to work consistently in certain cases" rather than
"to work at all"?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
> I believe that non-contiguous netmasks actually are illegal nowadays.

Cite?

> They became illegal when CIDR was implemented.

Implemented?  I doubt it.  Standardized, at most.  But even then, it
would take years to eliminate everything that supports them - indeed, I
just now tried it and find that NetBSD 4.0.1 appears to support them.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
> On the other hand, simple non-contig netmasks, with no ambiguity,
> certainly were permitted originally.  They work just fine.  They also
> offer essentially nothing over contig netmasks - they're just a
> permutation of the bits in the addresses.

I wouldn't say _nothing_.  See below.

> The one (the only) reason they were permitted, that I know of anyway,
> was that by allowing them we apparently let some (perhaps
> hypothetical) sites implement subnets without altering their existing
> IP numbering scheme.  I personally never saw such a site, and have no
> direct evidence one ever existed (or that anyone ever actually used
> non-contig netmasks for this reason) - but that was the argument
> anyway.

I have.  For a significant time (years) I was running my house LAN with
a netmask ending in (binary) 11011000, I think it was - a /29 expanded
by adding a second /29 from higher up.  (The memory is very fuzzy, but
255.255.255.216 looks right.)

The reason was exactly this: growing the space without renumbering when
the original space's pair had alreayd been allocated elsewhere.  Was it
necessary?  Not for most values of "necessary".  Was it useful?
Definitely.  Not visible outside its parent network, of course, but
that's true of most subnetting schemes, including CIDR ones, and it was
in live use for years.

>> I was actually at the pre-CIDR IETF meeting where it was discussed
>> whether to standardize the forwarding lookup for routes with
>> non-contiguous masks or disallow them altogether.

Out of scope.  A host's routing implementation is not visible from the
network; it is not a matter for the IETF to standardize.

If you want to forbid noncontiguous netmasks in wire protocols like BGP
or RIP or whatever, that is in scope, but also irrelevant to what
you're describing.

>> You are almost 20 years too late to influence that outcome.

Irrelevant.  Nobody off-network can tell whether I'm using
noncontiguous netmasks within my network, so nobody but my
co-administrators has standing to even comment on the question.

Of course, NetBSD may, if it wishes, desupport them.  It also may, if
it wishes, desupport netmask boundaries falling other than on octet
boundaries.  I would call the one a bug just as I would the other.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 16 year old bug

2010-08-23 Thread der Mouse
>> Fix a 16 year old bug in the sorting routine for non-contiguous netmasks.
> Does our IPSEC code actually _use_ non-continguous netmasks?

I haven't looked at the IPsec code, so this is a guess, but the wording
makes it sound as though this is an implementation technique used
internally by IPsec rather than being the externally-visible use of
noncontiguous netmasks everyone seems to be taking it for.

That said,

> and most modern network hardware will turn their nose up at them
> AFAIK.

IMO anything that pretends to implement IPv4 but which doesn't do
noncontiguous netasks is simply broken, I don't care whether it comes
from Cisco or Netgear or NetBSD.

Not, I suppose, that anyone necessarily cares what I consider broken.

Slow-path them.  Require a sysctl switch (the way we do for source
routes).  Fine.  But outright desupport them?  I'd call that a bug,
even if it is done deliberately.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: ptrace(2) PT_STEP changes and gdb

2010-08-17 Thread der Mouse
> Can't you just version it?  Rename existing PT_STEP to PT_OSTEP or
> something, define PT_STEP with the new value (instead of introducing
> new PT_* name)?

That works for ABI compatability but not API compatability.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: something really screwed up with mmap+ffs on 5.0_STABLE

2010-08-17 Thread der Mouse
> The only difference between the two programs is this:
> #if 1
> read(fd, buf, BUFSIZE);
> bmem = (void *)buf;
> #else
> busmem = mmap(NULL, sb.st_size, PROT_READ, MAP_FILE|MAP_SHARED, fd, 
> 0);
> if (busmem == MAP_FAILED)
> err(1, "mmap");
> bmem = busmem;
> #endif

Any particular reason to error-check mmap but not read?

Any particular reason to read BUFSIZE bytes but mmap sb.st_size bytes?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: kicking everybody out of the softc

2010-08-15 Thread der Mouse
> I've been working in spare moments on lockless code to prevent
> storage for a softc from going away while a driver uses it.  [...]

This looks superficially good, but isn't the cost of the necessary
memory barries and cache flushes comparable with the cost of a more
traditional scheme?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: weird ktrace timestamps?

2010-08-04 Thread der Mouse
>> There are two interesting things here: the timestamps in the file do
>> not warp backwards by almost two seconds - but they do warp
>> backwards, by 3353 ns.
> Try a different time counter.  See kern.timecounter.

kern.timecounter.choice = clockinterrupt(q=0, f=100 Hz) TSC(q=-100, 
f=1795636350 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) 
dummy(q=-100, f=100 Hz)
kern.timecounter.hardware = ACPI-Fast
kern.timecounter.timestepwarnings = 0

I switched to TSC and got the same syndrome (well, as far as kdump
output goes; I didn't dig into ktrace.out).  Switching to
clockinterrupt made the syndrome go away, but all the timestamp deltas
printed by kdump -R were zero (there probably were a very few that
weren't, but I didn't see any of them in a quick eyeball skim).

> Is this UP or MP?

MP.

cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Core 2 (Merom) (686-class), 1795.62 MHz, id 0x6fd
cpu0: features bfebfbff
cpu0: features bfebfbff
cpu0: features bfebfbff
cpu0: features2 e3bd
cpu0: "Intel(R) Core(TM)2 Duo CPU T7100  @ 1.80GHz"
cpu0: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way
cpu0: L2 cache 2 MB 64B/line 8-way
cpu0: using thermal monitor 1
cpu0: Enhanced SpeedStep (1420 mV) 2000 MHz
cpu0: unknown Enhanced SpeedStep CPU.
cpu0: using only highest and lowest power states.
cpu0: Enhanced SpeedStep frequencies available (MHz): 2000 1200
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 64 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel Core 2 (Merom) (686-class), 1795.43 MHz, id 0x6fd
cpu1: features bfebfbff
cpu1: features bfebfbff
cpu1: features bfebfbff
cpu1: features2 e3bd
cpu1: "Intel(R) Core(TM)2 Duo CPU T7100  @ 1.80GHz"
cpu1: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way
cpu1: L2 cache 2 MB 64B/line 8-way
cpu1: using thermal monitor 1

I did consider the possibility that MP was relevant, but it strikes me
as highly unlikely that it would migrate between processors exactly
there every time (and, apparently, _only_ there).

Is it likely enough that it's something else tied to MPness that it's
worth trying a test under a UP kernel?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


weird ktrace timestamps?

2010-08-04 Thread der Mouse
Is there known to be something weird with ktrace record timestamps
under 4.0.1 (i386, in case it matters)?

I just did some ktracing (using ktrace -i -p ..., in case it matters)
on an i386 machine.  The first two lines of kdump -R output on the
resulting file were

   693  1 xshowppm 0.0 EMUL  "netbsd"
   693  1 xshowppm -1.96647 CALL  accept(3,0xbfbfe90c,0xbfbfe908)

The peculiar apparent timewarp made me look deeper.  Here's (the
beginning of) hexdump -C output on ktrace.out:

  06 00 00 00 07 00 01 00  b5 02 00 00 78 73 68 6f  |xsho|
0010  77 70 70 6d 00 00 00 00  00 00 00 00 00 00 00 00  |wppm|
0020  8e 89 59 4c 97 15 a1 2a  01 00 00 00 6e 65 74 62  |..YL...*netb|
0030  73 64 14 00 00 00 01 00  01 00 b5 02 00 00 78 73  |sdxs|
0040  68 6f 77 70 70 6d 00 00  00 00 00 00 00 00 00 00  |howppm..|
0050  00 00 8e 89 59 4c 7e 08  a1 2a 01 00 00 00 1e 00  |YL~..*..|
0060  00 00 0c 00 00 00 03 00  00 00 0c e9 bf bf 08 e9  ||
0070  bf bf 10 00 00 00 02 00  01 00 b5 02 00 00 78 73  |..xs|

Based on , I break down the first two records as

06 00 00 00 ktr_len, 6
07 00   ktr_type, KTR_EMUL
01 00   ktr_version, 1
b5 02 00 00 ktr_pid, 0x2b5 = 693
78 73 68 6f 77 70 70 6d ktr_comm, "xshowppm"
00 00 00 00 00 00 00 00
00
00 00 00(compiler-generated struct padding)
8e 89 59 4c 97 15 a1 2a ktr_time (_ktr_time._ts), 1280936334.715199895
01 00 00 00 ktr_lid (_ktr_id._lid), 1
6e 65 74 62 73 64   record contents, "netbsd"

14 00 00 00 ktr_len, 20
01 00   ktr_type, KTR_SYSCALL
01 00   ktr_version, 1
b5 02 00 00 ktr_pid, 0x2b5 = 693
78 73 68 6f 77 70 70 6d ktr_comm, "xshowppm"
00 00 00 00 00 00 00 00
00
00 00 00(compiler-generated struct padding)
8e 89 59 4c 7e 08 a1 2a ktr_time (_ktr_time._ts), 1280936334.715196542
01 00 00 00 ktr_lid (_ktr_id._lid), 1
1e 00 00 00 0c 00 00 00 record contents
03 00 00 00 0c e9 bf bf
08 e9 bf bf

There are two interesting things here: the timestamps in the file do
not warp backwards by almost two seconds - but they do warp backwards,
by 3353 ns.

So, there are two strange things here: (1) that the timestamps in the
file go backwards and (2) that kdump prints them as going backward much
further than they actually do.

This is not just ntpd happening by bad luck to adjust the clock right
then; I did another run and got the same syndrome (with a very slightly
different time delta) at the same point - the first and second records
of the file. Furthermore, the program forks, and the first two records
generated by the child show a very similar bogon in kdump output:

   693  1 xshowppm 0.04470 CALL  write(9,0x806c000,0x2800)
   721  1 xshowppm 0.12292 EMUL  "netbsd"
   721  1 xshowppm -1.97486 RET   fork 0
   693  1 xshowppm 0.05866 GIO   fd 9 wrote 4088 bytes

I haven't dug out the underlying records for this case; it's far enough
into the file they would be inconvenient to locate.  But the kdump
output is similar enough that I would expect it to be due to the same
cause, whatever it is.

Any thoughts?  I'll investigate this eventually if I don't hear
anything, but it strikes me as reasonably likely that someone will
recognize the syndrome and be able to point me in a useful direction.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: fd code multithreaded race?

2010-08-04 Thread der Mouse
> So something naiive like this: [...] will give you the wrong result.

True.

Essentially, you have a shared resource (the fd-number to thing table)
which you are using without any locking.  Of course there are races!

Some threading setups allow different threads to have independent file
descriptor tables, which would avoid the issue you sketch (but, of
course, introduce other issues).

> Never realized file descriptors and threads were so tricky ;)

There's nothing special about file descriptors here.  You have
basically the same issue with any other piece of state which is shared
by all threads.  To pick two more examples the kernel maintains,
working directory and umask.  These are a shared resource; like any
shared resource, accessing them from multiple threads requires care,
and usually locking of some sort.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential re(4) / netbsd-4 / i386 problem?

2010-07-26 Thread der Mouse
>> I've got 3 motherboards with re onboard that I've tested, 2 of the 3
>> have the problem.  I checked the re hwrev and the one that works
>> fine is 0x2800.  The 2 boards that don't work have hwrev
>> 0x3800 and 0x3C40.
> I'll have a look at the code and see if I can find the hwrev value
> you're talking about and print out its value for my hardware.

0x3400.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Modules loading modules?

2010-07-26 Thread der Mouse
> We have a modular device driver, let's call it xxxmod.  [...]  It []
> might attempt to use an optional module (e.g., zzzverbose) to print
> some device attachment messages.

> First, a required module cannot be optional.  If the desired module
> is not present, or if it is present but its own
> xxx_modcmd(MODULE_CMD_INIT, ...) fails, the failure is propagated
> back to the original "outer" call to module_load() which will also
> fail.

> The second reason why this is not suitable is that the "outer" load
> will add a reference to the module, preventing it from being
> auto-unloaded.

Surely the right answer here is to provide a way to say "refer to this
module, but it's ok for its load to fail, and it's ok for it to get
auto-unloaded", including passing up whatever information is necessary
for the calling module to do something useful in failure cases?

It really seems to me that the module system is there to help us, not
to shackle us, and that if it has properties which are leading to
problems, one of the options we should at the very least be considering
is changing those properties.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: envsys issues [was Re: envstat wrong: who's at fault?]

2010-07-26 Thread der Mouse
>> [...]
> This is greatly cleaned up in the envsysV2 implementation.

>> [...]
> In envsysV2, a single call is used [...]

Then there's not really anything to discuss, because it's all already
been fixed.  Excellent.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


envsys issues [was Re: envstat wrong: who's at fault?]

2010-07-26 Thread der Mouse
[moved from port-i386]

[me]
>> In passing, would it be appropriate and/or useful to suggest
>> improvements to [the envsys(4)] API?  When I was writing code, I
>> found the envsys(4) ioctls to be deficient for my purposes.
[Paul Goyette]
> I'd be interested in knowing in which ways you found the current API
> lacking.  [...]

Well, some things are underdocumented.  For example, it seems that most
sensors are not in the units specified by envsys_tre_data_t.units, but
rather in 1e-6 of that unit - for example ENVSYS_SWATTHOUR sensors are
not in watthours but in microwatthours - and, as far as I've been able
to find, this factor of 1e6 is not documented anywhere but the source
to envstat(8).  Except for temperature, which is specifically said to
be in microKelvins.  But others are said to be in volts, amps, etc.  To
quote both the envsys(4) and ,

 union { /* all data is given */
 uint32_t data_us;   /* in microKelvins, */
 int32_t data_s; /* rpms, volts, amps, */
 } cur, min, max, avg;   /* ohms, watts, etc */
 /* see units below */

If the "micro" is supposed to apply to all those units, not just
Kelvins, then (a) "rpms" needs to be removed from the list and (b) the
wording needs to be improved.

But the one piece I've found so far that isn't just a lack of
documentation (well, unless there are totally undocumented calls) is
that there's no way to fetch multiple sensors' values without potential
for skew between them..  For example, to quote from the code I've been
developing,

 prev_charging = -1;
 prev_discharging = -1;
 /*
  * This loop exists in an attempt to avoid getting confused by things
  *  changing in between fetching one variable and fetching the other.
  *  It can still get confused if multiple changes occur during that
  *  interval, but, without some kind of assist from the API, that's
  *  not fixable.  (My preferred assist would be a way to fetch
  *  multiple sensors' values atomically; failing that, some kind of
  *  change serial number.)
  */
 while (1)
  { charging = get_boolean(SENSOR_CHARGING);
if (charging < 0) return(0);
discharging = get_boolean(SENSOR_DISCHARGING);
if (discharging < 0) return(0);
if ((charging == prev_charging) && (discharging == prev_discharging)) break;
prev_charging = charging;
prev_discharging = discharging;
  }

(get_boolean returns 0 or 1, wrapping ENVSYS_GTREDATA, with checks that
the sensor being fetched is of type ENVSYS_INDICATOR and has
ENVSYS_FCURVALID set; the SENSOR_* defines are application-specific).

Or is this fundamentally unfixable because sensors' values are fetched
serially and individually from the hardware with the corresponding risk
at that level too?  Then I agree with the remark in the manpage that
some kind of event-stream interface is needed.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential re(4) / netbsd-4 / i386 problem?

2010-07-25 Thread der Mouse
>> Actually, my main reason for writing is to mention that I have a
>> laptop, running 4.0.1, with an re onboard, and have never seen such
>> random crashes.  I can give more details if they matter.
> I've got 3 motherboards with re onboard that I've tested, 2 of the 3
> have the problem.  I checked the re hwrev and the one that works fine
> is 0x2800.  The 2 boards that don't work have hwrev 0x3800
> and 0x3C40.  The board that's fine is a commercial Intel DG41MJ
> while the other 2 are both DFI industrial boards (LT600-DR, LT330-B).

My laptop is a Sony Vaio (PCG-5G3L).  The re is

re0 at pci3 dev 0 function 0pci_mem_find: void region
: RealTek 8100E/8101E PCIe 10/100BaseTX (rev. 0x01)
re0: interrupting at ioapic0 pin 18 (irq 7)
re0: Ethernet address 00:13:a9:f2:6f:af
re0: using 256 tx descriptors
rlphy0 at re0 phy 7: RTL8201L 10/100 media interface, rev. 1
rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

I don't see anything there that looks like the rev numbers you're
talking about.  While now is not a good time, I'll have a look at the
code and see if I can find the hwrev value you're talking about and
print out its value for my hardware.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: RFC: device flavours

2010-07-25 Thread der Mouse
> I'm looking for comments about what I call "device flavours".  [...]

I'm having trouble seeing what this offers over things (like scsibus)
where an abstraction attaches at real hardware and then other things
attach to the abstraction.

> flavour acpiib at pci: acpinodebus
> filedev/acpi/acpiib.c   acpiib
> 
> flavour ichlpc at pci:  acpipmtimer, sysmon_wdog, fwhichbus, hpetichbus, gp=
> iobus
> filearch/x86/pci/ichlpc.c   ichlpc

> flavours ichlpc, acpiib
> npx*at pcib?
> gpio*   at gpiobus?

> I've been wondering about simply allowing more than one driver to
> attach to a device,

It seems to me that we already have something effectively the same as
that, mediated by a "controller" driver.  For example, consider the
way, on sparc, zs attaches to the chip and then zstty or kbd or ms
attaches to zs (or at least that's how it used to work).

You write that

> But the main point is that a flavour can be created without the main
> driver being aware of it;

but, again, it looks to me as though we already have that: to return to
the zs example, the zs code does not need to know anything about the
list "zstty, kbd, ms" in order for those child devices to work.

But I feel certain you are already familiar with all that.  So I must
be missing something.  What?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential re(4) / netbsd-4 / i386 problem?

2010-07-23 Thread der Mouse
> is it possible that the re device is writting past its buffer (via
> DMA) and overwriting random memory ?

Isn't that one thing the iommu is for?  Oh, wait

Well, use machines whose designers cut corners on hardware design and
guess what happens.

Actually, my main reason for writing is to mention that I have a
laptop, running 4.0.1, with an re onboard, and have never seen such
random crashes.  I can give more details if they matter.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: The missing membar_X() directive

2010-07-14 Thread der Mouse
> IIRC, you cannot implement RCU in non GPL software (unless IBM gives
> approval for it).

Why not?

Even if it's patented, it's unlikely to be patented anywhere but the
USA (certainly the Wikipedia page gives no reason to think so); there's
no reason the rest of the world should have to suffer the
boneheadedness the USA has chosen to impose on itself.

Since NetBSD is a USA entity, there _is_ a reason for NetBSD to put up
with the USA's idiocies, but I see no reason someone in the sane world
can't implement it and make patches (to NetBSD or anything else)
available.

I suspect the USA patent is invalid, too, though (especially given its
date) it's unlikely anyone with deep enough pockets to take it on
cares.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: MP locking?

2010-07-01 Thread der Mouse
>> [...roll forward pre-MP code to become MP-ready...]

>> lock(9) outlines locking facilities which I believe I can use to do
>> [the locking] I want - but there are other issues [...]

> You need memory barriers, modern CPUs can do speculative reads, and
> some can reorder writes.  AFAIK, lock operations act as memory
> barriers.  atomic ops do not.

But are lock calls enough?  As I understand a memory barrier (which
understanding is mostly based on CPU documentation of barrier
instructions), it is not enough, since it does not imply any cache
synchronization with other CPUs.  If CPU A writes a shared data
structure and then issues a memory barrier, either (i) this has no
particular effect on CPU B's cache, leading B to possibly use stale
data, or (ii) this forces CPU B to discard its whole read cache,
incurring a mostly (possibly entirely) unnecessary performance hit.

(ii) can be avoided if memory barriers apply to only certain addresses,
but, since the locking calls do not take any addresses except those of
the lock, the implicit memory barrier you refer to cannot have that
kind of information associated with it.  (This actually leads me to
wonder: are those implicit memory barriers good for anything besides
device access?)  I find it hard to believe that releasing a lock causes
all other CPUs to discard their entire read caches

This leaves me with (i).  I know some cache hardware does snooping of
writes by other CPUs (or, semi-equivalently, DMA engines) to avoid the
issue entirely, but I also know some hardware doesn't.  How does NetBSD
address the issue?  The only thing "man -k barrier" turned up, besides
a handful of pthread calls, is bus_space(9), which AIUI is not relevant
here because these are data structures in ordinary kernel memory, not
bus space - am I missing something?  Does this just mean that 4.0.1
(which is what I was checking the manpages on) does not support MP on
hardware which doesn't do cache snooping, or is there some facility
I've missed for arranging for other-CPU cache flushes, or what?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: MP locking?

2010-07-01 Thread der Mouse
> Having just recently spent quite a bit of time porting a kernel
> module from NetBSD-3 to NetBSD-5, and working out a bunch of
> synchronization issues, I suggest skipping over NetBSD-4 and going
> straight for 5.

Noted, though I doubt a major version jump is in the cards for that
machine in the foreseeable future.

And, really, because it still needs to run in the original pre-MP
kernel with minimal changes, what I'm going to be doing is pulling all
the locking out so those differences can be hidden from everything but
a relatively small and well-defined piece of the code, making further
ports to the locking scheme du jour relatively simple.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


MP locking?

2010-06-30 Thread der Mouse
I have some kernel code which was written for a pre-MP kernel; it uses
spl*() for locking.  I'd like to roll this forward to something at
least slightly more modern - specifically, a dual-CPU 4.0.1 machine.

lock(9) outlines locking facilities which I believe I can use to do
what I want - but there are other issues, such as cache coherency; do I
need to do anything special with shared data structures to ensure
coherency between processors?  Is it enough to declare them volatile,
or do I need memory barriers as well, or what?  To what extent can I
use locks from within code invoked by callouts?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: systems hangs with slow disk - how does FS locking work?

2010-06-29 Thread der Mouse
> My assumption is that the "big" data block written is devided in
> several small chunks (disk blocks, 512 bytes), which will then be
> written to disk sequentially, and with the system preempting if
> there's other work to do.  The preemption would further delay the
> writes, but write speed not an issue.

Depends.  Does the driver do DMA, or does it have to go with PIO?

If PIO, you may actually not be blocking all of userland for the whole
duration, but it may be getting only tiny amounts of CPU in between
each underlying transfer and the next, which may look very similar.

Especially if the PIO is slow.

Another possibility is that the underlying hardware uses a block size
larger than 612, like 2K or 4K, and is doing a read-modify-erase-write
cycle for each 512-octet sector.  Erases can be slow.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


  1   2   >